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PITFALLS IN THE USE OF “PROBABLE ERRORS” 
C. SPEARMAN 


The suggestion is here ventured that in the prevalent manner of 
handling probable errors some fundamental mistakes are made. As 
outstanding material for discussing these, we will take a recent work 
by Garrett and Anastasi on ‘“The Tetrad-difference Criterion” (Annals 
of the New York Academy of Sciences, 1932). But the points at 
issue seem to apply to statistics in general. 


I. PROBABILITIES OF “SIGNIFICANCE”? AND OF “INSIGNIFICANCE”’ 


Let us begin with a disagreement which has arisen between these 
distinguished authors and myself with regard to some correlations 
obtained many years ago by Heymans and Brugmans. These correla- 
tions, on being submitted to the new criterion of tetrad-differences— 
or, more briefly, ‘“‘tetrads’’—produced a tetrad of .20 with a probable 
error of .13. For my part, I had said that such a small tetrad with such 
a large probable error could not properly be regarded as “significant. ’”’ 
But now Garrett and Anastasi advance the view that this verdict of 
mine was ‘‘almost certainly wrong,’’ on the ground that ‘‘the chances 
are nearly 6 to 1 that the tetrad of .20 does represent a significant 
deviation from zero.” 

Here, the term “significant” is obviously taken in its usual sense of 
“true.”? That is to say, a deviation observed in a sample is significant 
or true, if a corresponding deviation holds also for the entire population 
from which the sample is drawn. Accordingly, what is here rated at 
“nearly six to one” is the ratio between the probability of the true 
value not being zero and the probability of its being zero. This 
interpretation of their statement is borne out by their further subse- 
quent statement that, when the ratios of tetrads to their probable 
errors are respectively .77, 1.54 and 2.16, then ‘‘The chances in one 
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hundred that these tetrads are significant are seventy, eighty-four, and 
ninety-three.’’ Further, this view is still more explicitly enounced 
by Garrett in his general textbook on Statistics. He takes the case of 
8585 adults having an average height of 67.46 inches with ac,, of 
.0277. He says that here ““The chances are 5826 in 10,000 or 58 in 100 
that the obtained average of 67.46 does not diverge from the true value 
by more than +1lo,,.’”’ What is yet more, words that seem to express 
similar views may be found in the writings of many other statisticians. 
The matter, then, seems to be worth our careful consideration. 

Let us go back a little to first principles. We may lay down that in 
all calculation of probabilities the primary requirement is to divide 
up (without remainder) all possible events into a certain number of 
mutually exclusive cases. To obtain these when sampling, we may lay 
down that the observed sampling value is either significant, or else is 
not so;! and again, if it is not significant, it may or may not be less than 
any chosen magnitude. We thus arrive at three possible cases. In the 
first of these, the observed value is significant. In the second case, it 
is insignificant and less than the chosen magnitude. In the third case, 
it is insignificant and equal to or greater than the chosen magnitude. 
These three cases may be denoted respectively by the symbols, S, I<, 
Is. The three never overlap, and together they exhaust all 
possibilities. 

Now I venture to assert that Garrett’s relative probability of 
“nearly six to one” is really that of T<m to Tym. Such a relative 
probability does approximately occur when, as in the present case, the 
ratio of the observed value to the PE is 20:13. So too his “‘sixty- 
eight in a hundred” is really the relative probability of J<mtol<n+ 
Iym. And similarly his other values. Throughout, then, he really is 
only comparing different classes of insignificant or “untrue’’ results 
with one another. The whole current claim in such situations to 
compare the probability of any insignificant value with that of any 
significant one would appear to be devoid of validity (in particular, 
such a situation does not supply the least ground for taking the prob- 
ability of the S to equal that of either or both of the J’s). In order 
to obtain the probability of S (or any division of this), we usually 
must turn away from all definite numerical proportions and content 
ourselves instead with indefinite subjective estimates. 





1 For simplicity I will take this plain dichotomy. The more complicated cases 
that introduce any significance of partial or approximate order do not present any 
difference in principle. 
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Let us illustrate this by a homely example. Suppose a traveller to 
know that some restaurant gives pudding regularly once a week, so that 
in this sense the odds on any random day are six to one against pudding. 
Suppose further that he is ignorant as to which is the pudding day, but 
on visiting the restaurant finds the pudding there. One obvious 
hypothesis is that he has happened to come on the regular day. But 
another conceivable hypothesis would be that the restaurant has 
come under a new management which is more generous of puddings. 
The former hypothesis would imply that the presence of the pudding on 
the day of the visit was “insignificant”’; that is to say, it did not signify 
anything true generally; it was the case of J;,, (where the suffix ,, 
indicates pudding). The hypothesis of new management, on the 
contrary, makes the presence of the pudding “significant’’; it does 
signify something true generally; it is the case that we have called S. 

Now, in such a situation our authors—to be consistent with them- 
selves—would have to put the odds at six to one that there has been 
a change of management; they would have to maintain that such a 
change is ‘‘almost certain.” 

For my part, on the contrary, I submit to the reader’s common 
sense that no such conclusions are warranted. To obtain any definite 
probabilities whatever between the two hypotheses (those of [ym 
and of S) would, it seems to me, be a very rare occurrence. For 
example, there might conceivably in our preceding situation be some 
possibility of ascertaining the average frequency with which the man- 
agement of restaurants in that town had been changed. If this turned 
out to be, say, once a year, then the respective changes of S and Js » 
might be taken at 1:364 and 1:6. This would make not S, but on 
the contrary Jy;,,, “almost certain’?! Much more usually, the odds 
about S do not admit of even the wildest numerical computation. 
Instead of attempting any such feat, the statistician must content 
himself with arguing that the occurrence of S is at any rate probable 
enough to be worth taking into serious consideration; he then tries to 
render the probability of Js », so extremely minute that this alternative 
is at any rate far inferior to that of S. 

Quite accordingly, that sanest of statisticians, Udny Yule, declares 
that ‘“‘we cannot feel any great confidence that it (the result obtained 
by sampling) is likely to be significant” unless it exceeds four times 
the probable error; such a sampling deviation would happen, not once 
in six trials, but only about once in one hundred forty-two. He 
even goes so far as to propose a limit of three sigmas, or 44 times the 
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probable error; this would occur as a sampling deviation only once in 
about three hundred and seventy trials. Curiously enough, Garrett 
himself in his own textbook writes that for a deviation to be taken as 
‘practically certain’’ it should be as much as four times the probable 
error. How this former ruling is to be reconciled with his present one 
that a deviation no more than 20:13 or 1.54 times is “almost certainly” 
significant, remains a mystery. 

There is one more point to mention about this matter. Garrett 
and Anastasi quite correctly say that-I had not even been content 
with setting the limit for evidence at 414 times the probable error, 
but had proposed as much as five times. My so doing, they urge, is 
“stacking the cards heavily in favour of the Two-Factor theory.” 
But these authors are strangely overlooking the fact that really I had 
proposed a double limit. Five times the probable error had only 
been advanced by me as needed for evidence to be regarded as “‘con- 
clusive.”” This last term means that (so far as mere statistics go) 
the search for evidence is brought to its conclusion; any further 
enquiry would be superfluous. The authors omit to mention that 
over and above this “‘conclusive”’ limit I had also proposed a far milder 
one; namely, only three times the probable error; this ratio was said 
by me to be at any rate “‘suggestive”’ and therefore to indicate cases 
which needed further investigation (Abilities of Man, p. 295 and 
elsewhere). 


II. DERIVATION OF PROBABLE ERRORS FROM HYPOTHESES 


So far, we have solely considered how the probable error should 
be interpreted. There remains the no less fundamental question as 
to how it should be obtained. Again the matter in itself affects 
statistics quite generally, but will here be illustrated by special refer- 
ence to the work of Garrett and Anastasi. 

This work centres in the task of proving or disproving, by means 
of the tetrads observed in'samples, that the “‘true”’ tetrads are all zero. 

Now, for calculating the probable errors of these tetrads, the values 
entering into the formulas may be obtained in two different ways. 
By one of them, all tetrads entering into the formula are set down as 
zero. By the other way, each tetrad is set down as actually observed 
by sampling. I for my part have always employed the first way. 
Garrett and Anastasi, with many other writers, adopt the second. 

Now, the objection commonly raised to the first procedure is that 
we cannot reasonably assume the tetrads to be zero, since their being 
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so or not is just the matter in dispute. But this objection is, I sub- 
mit, fallacious. In general, and once more going back to first princi- 
ples, I maintain that we do throughout science assume just that which 
is in dispute; we assume this hypothetically, and then proceed to 
examine whether the consequences deducible from it agree or not with 
the actual observations. As for the particular case of probable errors 
(whether of tetrads or otherwise), I suggest that the values entering 
into them should always when possible be taken from some hypothesis; 
not from anything observed in samples. 

The situation can be represented still more strongly. Let A 
denote the hypothesis or assumption (here the two terms are equiva- 
lent) that all the true tetrads are zero; and let B stand for their 
having any other value whatever (observed or otherwise). Suppose, 
further, that these two assumptions lead to appreciably different 
consequences (or else the problem vanishes). Then, if the actual 
observations were to agree with the consequences of B, this result, 
far from constituting, as our authors think, the correct support of the 
hypothesis A, would instead tend to oppose it! 

What, then, it may be asked, is the reason for the common statis- 
tical practice being otherwise, in that the probable error employed 
does not usually represent the deviation to be expected from any 
hypothesis, but instead the deviation to be expected from some value 
actually observed by means of sampling? 

The answer is that here, as always, the observed value plays merely 
the part of a makeshift. The really needed hypothetical value cannot 
be used, as it is unknown. It is not even determinate at all; it may lie 
anywhere not too far from the sampling value. The matter is explic- 
itly treated by Yule. He not only shows that the usage of the observed 
value is no more than a makeshift, but even supplies a procedure for 
guarding against the dangers which such a makeshift usage incurs.' 


III. LIMIT TO SCOPE OF QUANTITATIVE EVIDENCE 


The next point to be taken here is no less fundamental and general. 
Our authors write: 


In order to be practically certain of finding a general factor, therefore, all one 
needs to do is to keep down the sample to twenty-five cases or less, as it is next to 
impossible for a tetrad to be significantly greater than its PE under such con- 
ditions. However, this is an extremely dubious test of the presence of a general 
factor, although Spearman does not seem to think so. 





‘Introduction to ‘“‘Theory of Statistics,” fifth edition. P. 277. 
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Far from really adopting any such strange “test of the presence of a 
general factor,’’ I would maintain rather that, strictly speaking, there 
does not and cannot exist any absolute test whatever of its presence. 
I would go much further and say that no precise quantitative theorem 
at all—whether affirmative or negative, psychological or even physical 
—admits of being absolutely proved. Instead of proving the theorem 
to be true, one can only hope to show that its error, if any, has less than 
some estimable magnitude. In other words, a theorem can only be 
proved within the limits prescribed by the error of the experiment. Thus 
the suggested policy of using a single and ridiculously small sample 
would, indeed, escape bringing evidence against the two factors; but 
only at the price of not bringing any appreciable evidence of any kind. 
Instead of being “capable of finding a general factor,’’ we should 
become certain of not ascertaining anything at all! 

Incidentally it may be added that numerous samples, even if 
individually very small indeed, can quite well be made to yield excellent 
evidence either for or against a theory. A word, too, about the per- 
sonal reference. Even apart from the preceding theoretical considera- 
tions—repeated by me times without number—I do protest that 
Garrett ought to have been saved from his misrepresentation of me by 
considering the actual practice both of myself and my collaborators. 
During more than a quarter of a century, we have never employed 
samples of less than about one hundred individuals; and at least three 
times we have brought forward more than one thousand. Samples of 
less than about one hundred have only been taken by us into considera- 
tion when—as occurred with the work of Simpson—an attempt had 
already been made to exploit these against us. 


IV. THE ALTERNATIVE VIEWS SEEN IN THE LIGHT OF THEIR PRACTICAL 
EFFECTS 


So far, we have only considered the work of Garrett and Anastasi 
in respect of their interesting theoretical attitudes and discussions. 
But they have also supplied some valuable experimental statistics. 
In particular, they have obtained samples of twenty-five, fifty, one 
hundred, and two hundred individuals, where the variables, four in 
number, had a prearranged constitution. In one case, which we will 
call A, each variable was composed of a general and a specific factor, 
with no group factor. In another case, B, each variable consisted 
solely of specific factors. In both cases then, the “true” tetrads are 
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known to be zero. With this known true situation was compared 
the situation as inferred from the experimental results of sampling. In 
this way, the validity of the method of inference from samples was put 
on trial. 

The issue of this trial is, according to our authors, adverse. For 
they conclude that with small samples “‘tetrad analysis from observed 
r’s breaks down.”” Such samples, they urge, may “give a very false 
picture of the basic constitution of the variables concerned.’’ 

From this general condemnation let us turn to the actual detailed 
experimental results from which it has been derived. The leading 
features seem to be as follows: In A two out of the twenty-four tetrads 
from samples of twenty-five are reported to be about four times their 
PE’s; in B one tetrad from the samples of one hundred is eight times its 
PE; and again in B one from the sample of two hundred is five times its 
PE. Bizarre results, it must be conceded. 

But to myself such results seemed to indicate a need for checking 
the arithmetic. This was done with the aid of a statistical class, 
to whom I here tender my cordial thanks. On so doing, we found 
every one of the anomalously large tetrads disappear. Noneremained 
that were much more than three times their PE’s.! 

After making these corrections in the experimental results, let 
us turn back to the task of estimating their validity for the purpose 
of inferring the true situation. We must remember from Section I 
of this article that this inference can be undertaken in various ways; in 
particular, different ratios of observe deviations to probable errors 
may be assumed to be ‘“‘significant.’”’ Suppose first that, following 
the way of our authors, we accept as “almost certainly” signifi- 
cant a ratio 1.54 to 1. Then the inference does indeed “break down’’; 
we should have to deduce from the experimemtal data a terrific 
number of false theorems. In the case of experiment A (see above), 
for instance, no less than eleven out of the forty-five tetrads would 
have to be taken as significant, although not one of themisso. But let 
us now adopt instead of their so misleading ratio our double limit of 3:1 
and 5:1. On so doing, no wrong tetrads at all would be accepted as 
demonstrated, and only one or two would be regarded as suggestive. 





1To give an example, our authors had in B obtained the surprising result of 
tioss = +.0060 + .0012. Instead, we found the quite normal value of .0028 + 
0114. The reader can at once check these figures for himself, seeing that ri. = 
0849, ris = .2220, rig = .0841, r23 = .0478, rag = .0157, rag = .0739 and N = 200. 
(P. 246 of their paper.) 
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We appear entitled to conclude that these data of our authors do 
indeed give a ‘‘very false picture’? when handled according to their 
views, but a perfectly correct picture (so far as it goes) when handled 
according to ours. This picture is, of course, to the effect that neither 
in experiment A nor in B were there any group factors large enough to 
be appreciable within the limits of the error of the experiment. 

So much for the lesson of the experimental data. But there is, 
I believe, a much more crucial way of trying out the different views. 
This is by calculating theoretically how their views would fare respec- 
tively in the long run. Suppose that in the course of different investiga- 
tions by all workers some ten thousand tetrads were to occur (a 
moderate enough estimate, since as many as a third of this number have 
already occurred in a single investigation, and work on a still larger 
scale is in progress). If we are to accept the author’s view that a 
deviation 1.54 times its PE is ‘almost certainly”’ significant, then, I 
submit, we should arrive at irrevocably registering no less than one 
thousand six hundred false theorems! Even if we were only to accept 
as conclusive those deviations which were four times as much as their 
PE’s, even then we should irretrievably accept about eighty false 
theorems. Surely, this would be muddle enough. Whereas on adopt- 
ing the policy of a double limit as recommended by us above, we should 
be unlikely to adopt in a final manner any false theorem at all, but 
should only mark out some four hundred theorems about which to 
suspend final judgment pending further evidence. This end-result of 
our procedure would appear to be not unsatisfactory. 
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SOME EVIDENCES OF EFFECTS OF THE PUPIL’S 
CLASSROOM ADJUSTMENT UPON HIS 
ACHIEVEMENT TEST PERFORMANCE 


CHARLES W. ST. JOHN 


Dana College, Newark, N. J. 


A study of the relations between intelligence test performance and 
educational achievement in the elementary grades' incidentally 
brought to light some data which at least strongly suggest that scores 
in achievement tests are significantly affected by the degree of adjust- 
ment or maladjustment of pupils to their teachers and general class- 
room situations. The further study of the matter which is here 
reported tends to confirm such a belief. 


THE SUBJECTS, DATA, AND STATISTICAL METHOD 


The subjects were five hundred three boys and four hundred fifty- 
five girls in the public schools of a residential suburb of Boston. 
The study covered a four-year period, during which the pupils were 
enrolled in sixteen different schools, with one hundred eighty different 
teachers, and in grades ranging from the first to the sixth, but chiefly 
the first four. 

The achievement test data were obtained from the records of the 
Harvard Growth Study. The tests used were the Haggerty Sigma 1 
reading test, the Ayres reading test (Burgess Picture Supplement), 
and the Peet-Dearborn arithmetic tests. The teachers’ marks used 
in the study were the final marks of the school year, which were given 
on a five-point scale: A, B, C, D, and E without plus or minus signs, 
both D and E indicating unsatisfactory work. The following marks 
were used in the study: teachers’ marks in reading, arithmetic, conduct, 
and effort, and the average of all marks in the subjects of study 
(averaged by the writer). 

All measures were transmuted into their sigma equivalents, and 
only these derived measures were used in the study of correlations. 
Each achievement test score was given its actual plus or minus sigma 
value in the distribution for both sexes together in the grade (in all 
schools) in which the pupil was enrolled when the test was taken. The 
number of scores in these several distributions ranged from about 





1 St. John, Charles W.: “‘ Educational Achievement in Relation to Intelligence.” 
Cambridge: Harvard Studies in Education, Vol. XV, Harvard University Press, 
1930. 
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four hundred to about eight hundred. The transmutation of teachers’ 
marks was accomplished by taking the sigma equivalents of percentile 
points of the probability surface, a method described by Thurstone.! 
That is, five distributions were taken of the marks of each of the 
one hundred eighty teachers: For reading, arithmetic, conduct, effort, 
and average. All marks of about half of the teachers were eliminated 
from further consideration because of insufficiently numerous or 
atypical pupil groups. For each distribution of the remaining 
teachers, the plus or minus percentile value of each mark in its own 
distribution was determined, and the sigma equivalent of this value was 
read from a table. While this is admittedly a crude and inaccurate 
statistical procedure, it largely eliminates the effects of the very great 
individual differences among these teachers in marking, and reinter- 
prets each mark in terms of its position in its own classroom distribu- 
tion. Thus it becomes fairly comparable with the derived values of 
the marks of other teachers, and with the derived values of the achieve- 
ment test scores, which indicate the position in the entire grade-level 
distribution. 

This procedure for the transmutation of marks of course yields 
for the same marks (A, for instance, or B) different values for different 
teachers, so that the derived values for all the 89 teachers form 
practically a continuous series. Accordingly, correlations were 
computed by the Pearson product-moment method, although it will 
be remembered that in fact the original marks were given on a five- 
point scale. 


CORRELATIONS OF ACHIEVEMENT TESTS WITH MARKS 


In these first computations the data were used for all available 
years of the individual’s period of record, taken together, excepting (1) 
those years when a grade was being repeated and those immediately 
following the skipping of'a grade, at which times the values indicating 
grade position in achievement would be misleading, and (2) all years 
when a pupil was accelerated or retarded in his grade, the ages 6—0 in 
September to 7-11 in June being considered normal for Grade I, 
7-0 to 8-11 for Grade II, ete. The results of the two different reading 
tests were entered together in the same series of correlation tables. 

The correlation coefficients are shown in Table I. In the first and 
third columns are given the correlations of marks in June of each year 





1Thurstone L. L.: ‘“‘The Fundamentals of Statistics.”” New York: The 
Macmillan Company, 1925, pp. 157-160. 
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of record with achievement test scores in November of the same 
school year, and in the second and fourth columns, the correlations 
of marks in June with achievement test scores in November of the 
following school year. 


TABLE I.—CoRRELATIONS OF MARKS IN JUNE WITH ACHIEVEMENT TEST SCORES 
OF THE SAME ScHOOL YEAR AND THE FoLLOwIna ScHOOL YEAR 

















Reading tests in November Arithmetic tests in November 
Teachers’ marks Se Of the same (Of the following} Of the same /Of the following 
: x 
in June school year school year school year school year 
r PE r PE r PE r PE 
ind 5d 0d o's 6.04.0 + .54 + .04 +.34 + .04 +.29 +.02 +.17 +.03 
Girls +.43 +.04 + .30 + .05 +.38 + .02 + .32 + .03 
RS cnt vbewse.cud Boys +.50 + .04 +.22 + .05 + .50 + .02 +.30 + .03 
Girls | +.39 +.04 +.32 + .05 + .44 + .02 + .35 + .03 
Average...............| Boys | +.54 +.04 + .33 + .05 + .49 + .02 +.27 + .03 
Girls +.38 + .04 + .33 + .05 + .43 + .02 + .42 + .03 
Conduct...............| Boys | +.31 +.05 +.30 + .05 +.22 + .03 +.20 + .03 
Girls +.24 + .05 +.10 + .05 +.12 +.03 +.14 + .03 
Sia itd ah te Wage Boys | +.40 + .04 +.28 + .05 +.35 + .02 +.28 + .03 
Girls +.29 +.04 +.06 +.05 +.28 + .02 + .30 + .03 
a ado ld savdeia lal Boys | (+.46 + .04) | (+.29 +.05) | (+.37 +.02) | (+.24 + .03) 
Girls | (+.35 +.04) | (+.22 +.05) | (4+.33 +.02) | (+.31 + .03) 

















Attention is directed to the striking differences between the corre- 
sponding coefficients of the first and the second columns, and between 
those of the third and the fourth columns. With only two exceptions, 
the correlations of marks in June with tests in November of the same 
school year are higher than the corresponding correlations of marks 
in June with tests in November of the following school year. The 
amount of this difference is shown in Table II, in which are given, 
first the result of subtracting the ‘‘following year” coefficient from 
the corresponding ‘‘same year” coefficient, second, the mean of the 
Probable Errors of these two coefficients, and third, the quotient when 
the excess is divided by the mean PE. 

The consistency of the differences in the r’s suggests that the differ- 
ences are reliable, and this impression is at least moderately confirmed 
in Table II. Half of the differences are from three to nine times as 
great as the mean of the PE’s of the two coefficients which were com- 
pared; and the cases, where the differences are least, are those where 
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TaBLeE II.—DIFFERENCES IN CORRELATIONS OF MARKS IN JUNE WITH 
ACHIEVEMENT TEST SCORES OF THE SAME ScHOOL YEAR AND OF THE 
FOLLOWING ScHooLt YEAR 











Reading tests _ Arithmetic tests 
Teachers’ ie = 
marks in June Sex |Excess “ same on Excess of same — 
year over Mean divided by year over Mean divided by 
following PE ain Se following PE om on 
year year 
Reading........ Boys +.20 .04 5.0 +.12 .025 5.0 
Girls +.13 .045 2.9 + .06 .025 2.4 
Arithmetic... .. Boys + .28 .045 6.2 + .20 .025 8.0 
Girls + .07 .045 1.4 + .09 .025 3.6 
Average........ Boys +.21 .045 4.7 + .22 .025 8.8 
Girls + .05 .045 * +.01 .025 4 
Conduct....... Boys +.01 .05 .2 + .02 .03 7 
Girls + .13 .05 2.6 — .02 .03 —.7 
ee Boys +.12 .045 2.7 + .07 .025 2.8 
Girls + .23 .045 5.1 — .02 .025 — .8 
a Boys +.17 .045 3.8 +.13 .025 5.2 
Girls +.13 .045 2.9 + .02 .025 8 


























the original r’s were so low as to be of little or no significance. The 
differences, as well as the original r’s, it will be observed, are sub- 
stantially greater for boys than for girls, six of the ten differences for 
boys being four or more times as great as the mean PE’s, whereas such 
an amount of difference is found in only one instance for the girls, that 
being a case where both 7’s are so small as to be negligible. 


INTERPRETATION OF THESE CORRELATIONS 


The writer was surprised to find that the correlations of marks in 
June with achievement test performance in November of the same 
school year (seven consecutive school months earlier) were so uniformly 
greater than the correlations of marks in June with achievement test 
performance in November of the following school year (three vacation 
months and two school months later). Nelson’s review summarizing 
twelve studies of the effects of the summer vacation on school achieve- 
ment, and particularly his own investigation of the matter,' would 
suggest that the general achievement level of the preceding June would 





1 Nelson, Martin J.: ‘‘The Differences in the Achievement of Elementary 
School Pupils before and after the Summer Vacation.”’ Madison: Bureau of 
Educational Research Bulletin No. 10, University of Wisconsin, June, 1929. 
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be just about regained, after the summer loss, by the time of the 
November testing in the following school year, and thus that the 
correlations for the ‘‘following school year” would be greater than for 
the ‘“‘same school year.” 

For the reversal of this expected outcome no purely statistical 
explanation suggested itself. The conditions of the Growth Study 
made it extremely unlikely that it was due to a modification of teachers’ 
marks because of knowledge which they had of test scores. It was 
believed that the reason was in fact the adjustment or maladjustment 
of the pupil to his general classroom situation, and in particular, to his 
teacher. It was tentatively assumed that a pupil found himself well 
or poorly adjusted, in general or in one or more subjects, and that this 
affected both the teachers’ marks and the performance in achievement 
tests, so that these two measures tended to approach the same level. 
If this were the case, it would be expected that the different pupil- 
classroom and pupil-teacher adjustments which would be obtained in 
another school year would result in a lower correlation of marks in 
one school year with achievement test scores in another school year. 
This, of course, is the situation actually observed. 

It seems at first thought possible that the significant factor, which 
is approximately constant within the school year but varies from year 
to year, is the technical instructional proficiency of the teacher, that is, 
the command of the specific techniques of instruction as distinguished 
from the more obscure personality adjustments implied in the phrase, 
‘pupil-teacher adjustment.”’ But if instructional proficiency were the 
cause, it should be expected that the changes in this factor from year to 
year would increase the discrepancy rather than the likeness between 
tests in November and marks in June of the same school year. Fur- 
ther, it should be expected that the same conditions would be observed 
in pupils of the two sexes. The most superficial examination of 
Tables I and II, however, shows that the differences under considera- 
tion are more marked in the case of boys than in the case of girls. 

This immediately relates itself with one of the most significant 
findings of the present writer’s original study,’ namely, the many 
consistent and diverse evidences of greater school maladjustment of 
boys than of girls (the same boys and the same girls who are the 
subjects of the present study). There was no significant sex difference 
in intelligence. The chief sex differences were these: (1) School 





1 Op. cit.: Pp. 103-106, 110-111, 114-115, 118, 122, 128-129, 130, 135, 136-137 
139-140, 142-144, 146, 147, 153-154, 175-179. 
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achievement (as evidenced by marks, achievement test scores and 
grade progress), in general and in all special pupil-groups studied, was 
poorer for boys than for girls, both on the average and at identical 
IQ levels, the difference being greatest in teachers’ marks, and 
especially in “‘conduct” and “effort.’’ (2) The correlations of IQ’s 
with all criteria of achievement were lower for boys than for girls. 
(3) The differences between the r’s, IQ’s-with-tests and IQ’s-with- 
marks (the test r’s being the greater), were about twice as great for 
boys as for girls. (4) Marks in conduct and in effort correlated with 
achievement criteria more highly in the case of boys than in the case 
of girls. From these facts it was inferred that the boys were much 
more seriously maladjusted in school than were the girls, and that the 
marks and in fact the achievements of the boys were more affected than 
were those of the girls by the characteristics of personality and behavior 
which are generally represented in marks in effort and in conduct, and 
probably by other similar characteristics. It was further inferred 
that this maladjustment between the teachers and the boys probably 
was attributable chiefly to the fact that all the teachers were women. 
It is at least perfectly clear that there was a serious school malad- 
justment of these boys, and this fact is consistent with, and appears to 
strengthen, the tentative conclusion already advanced, that the 
reason for the excess of the ‘‘same year’’ correlations over the ‘‘follow- 
ing year” correlations is to be found in the pupil-classroom and pupil- 
teacher maladjustment, which is more marked in boys than in girls. 


CORRELATIONS OF ACHIEVEMENT TESTS WITH MARKS IN SUCCESSIVE 
YEARS 


It now seemed desirable to determine whether the condition 
observed was progressive or cumulative from year to year. Each 
of the correlations previously reported covered a period of four years, 
and hence these afforded no information on this point. Accordingly, a 
selection was made of all pupils who were six years old but less than 
seven when first entering the first grade at the beginning of the four- 
year period of record available. All data were eliminated for years 
immediately after skipping a grade, while repeating a grade, and while 
accelerated or retarded, as in the previous computations. The result- 
ing r’s are less reliable than were those previously reported, since the 
number of cases for each correlation now ranges from about eighty 
to one hundred thirty. The Probable Errors of the r’s range from 
+.04 to +.06. Not enough reading test data were available to give 
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TasLe III].—CorreELaTIons OF ACHIEVEMENT TESTS WITH MARKS IN SUCCESSIVE 


YEARS 





Arithmetic tests with arithmetic marks 













































































Boys 
Means 
1922-23 | 1923-24 | 1924-25 | 1925-26 
PAGO COROT FOR oc ccc cccccceccvceccccves + .53 + .46 + .62 + .36 (+ .49) 
Following school year.................... + .40 + .32 ie E évces (+ .36) 
Excess of ‘‘same year’’.................| (+.13) | (+.14) | (+.26) | ..... (+ .18) 
Girls 
DAMN GENO FORE. cece secs cscccscsccvens + .50 + .52 + .49 + .24 (+ .44) 
Following school year.................... + ,22 + .35 ae.  ereer (+ .29) 
Excess of ‘same year’’................. (+.28) | (+.17) | (+.19) |] ..... (+ .21) 
Arithmetic tests with average marks 
Boys 
I ncecndceceseneeeedaria + .47 + .30 + .57 +51 (+. 46) 
Following school year.................... + .27 + .48 A re (+ .35) 
Excess of “same year"’.................| (+.20) | (—.18) | (+.27) | ..... (+ .10) 
Girls 
cc ns pncboeeh aeeenenee + .54 + .37 . 58 + .36 (+ .46) 
Following school year.................... + .33 + .39 ee (+ .37) 
Excess of “same year’’................. (+ .21) (— .02) = f aera (+ .12) 
Arithmetic tests with reading marks 
Boys 
Same school year..................--000- + .39 + .20 + .49 + .36 (+ .36) 
Following school year.................... + .29 + .30 Gees wasn (+ .24) 
Excess of ‘“‘same year”’....... (+.10) | (—.10) | (+.37) | ..... (+ .12) 
Girls 
Oe cn ccccnesséeeesbeea + .60 + .27 + .48 + .36 (+ .43) 
Following school year.................... + .35 + .26 Gueeuk -sséeea (+ .27) 
Excess of ‘same year’’................. (+.25) | (4.01) | (+.29) | ..... (+ .18) 
Means of r’s, arithmetic teste with the three marks 
Boys 
if 
i. oa goa beenseuesewon + .45 + .32 + .56 + .41 (+ .435) } 
Following school year.................... + .32 + .37 Pune OF eden (+ .32) 
Excess of “‘same year’’................. (+.14) | (—.05) | (+.30) | ..... (+ .12) 
Girls 
SND COON DORR. ccc cs ccescceccesscoes + .55 + .39 + .62 + .32 (+ .47) 
Following school year................-++- + .30 + .33 Pee f° saces (+ .31) 
Excess of “‘same year”..............--- (+.25) | (+.06) | (+.22) ] ..... (+ .18) 
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reliable correlations on this basis, and thus only the correlations of 
arithmetic tests with marks could be studied. 

These correlations are reported in Table III, which is to be read as 
follows: (First column, read vertically) for boys, the r of arithmetic 
tests in November 1922 with arithmetic marks in June, 1923 (same 
school year) was +.53, and for arithmetic marks in June, 1923 with 
arithmetic tests in November, 1923 (following school year) the r was 
+.40, the excess of the ‘“‘same year” r over the “following year’’ r thus 
being +.13. The second column reports as ‘same year’’ correlations 
those between tests in November, 1923 and marks in June, 1924, and 
as “following year’’ correlations those between marks in June, 1924 
and tests in November, 1924. Data beyond June, 1926 are not now 
available to the writer, and therefore “following year’ correlations 
could not be computed for 1925-1926. 

It will be observed that the excess of the ‘‘same year”’ correlations 
over the “following year’ correlations holds for arithmetic tests and 
arithmetic marks in all years, and for arithmetic tests and other marks 
in every year except 1923-1924, when there appears to be a tendency in 
the other direction. It will be seen also that in the short period of 
years available there is only a slight suggestion of a tendency for the 
differences between the ‘‘same year” and the “following year”’ correla- 
tions to increase with the amount of schooling in the case of boys and to 
decrease in the case of girls. Finally, the condition here is not, asit was 
in the previous series of r’s, clearly more marked for boys than for 
girls. 

This supplementary study cannot be considered as strengthening 
the tentative interpretation previously advanced. If teacher-pupil 
maladjustment were the cause of the conditions observed, then for 
reasons already indicated one would expect the differences to be greater 
for boys than for girls. And the results for the year 1923-1924 seem 
at first to suggest that perhaps in that year the standard test scores 
were not available to thé teachers and that in other years the excess of 
the “same year” correlations over the “following year” correlations 
was in fact due to the teachers’ knowledge of test results and the effects 
of such knowledge upon their marks. 

This assumption about the year 1923—1924—an assumption entirely 
unsupported by any evidence in the Growth Study records—is hardly 
compatible with the fact that between arithmetic tests and arithmetic 
marks for that year the same relationship holds, for both sexes, as in 
other years. The reversal in 1923-1924 perhaps is fictitious and due 
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merely to the unreliability of the r’s. Perhaps it is associated with 
the accumulation of repeaters and duller pupils in the second grade. 
(Because of the conditions of the study, these second grades have ten 
per cent more pupils rating below 95 IQ and ten per cent less of those 
rating above 105 IQ than these fourth grades have.) It appears to be 
impossible, from the evidence available, reliably to determine the cause 
of the 1923-1924 reversal. 

The pupils selected for this supplementary study were those, 
it happens, whose total years of schooling were fewest. Others, 
who were included in the original study but not in this, were in the 
second or third grade at the beginning of the record and thus had one 
or two years more of school experience by the end of the period of 
record. It seems not improbable that maladjustments would become 
more marked by that time and thus that the inclusion of these cases 
would accentuate the condition, and particularly in the case of boys. 
These more advanced pupils were numerous, but an attempt to study 
them revealed that no homogeneous group of them could be found 
which was numerous enough to give reliable correlation data. 

This supplementary study must be considered inconclusive because 
of both the unreliability of the r’s and the short period of years over 
which the study could be carried. 


CORRELATIONS OF ACHIEVEMENT TESTS AND MARKS IN THE CASE OF 
‘‘REPEATERS”’ 


A study was now made of the “‘same year” and “following year” 
correlations for pupils who repeated grades. In this case, since the 
grade was repeated, the test scores themselves and not their sigma 
values in their grade distributions were used, and all pupils who 
repeated the second grade were selected for the study. The results 
are reported in Table IV. The ‘‘same year” correlations are those 
between arithmetic test scores of November of the repeating year 
and marks of the following June, in the same school year. The 
“following year” correlations are those between marks of the June 
preceding repetition and arithmetic test scores of the next November, 
in the following school year. The cases are few, and the 7’s are the 
least reliable of any reported here, their PE’s ranging from +.04 to 
+ Ra 

Here the difference between the ‘‘same year” and the “following 
year” correlations is extremely marked in the case of boys and is 
absent or reversed in the case of girls. Unreliable as this part of the 
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TaBLeE IV.—CorRELATIONS OF ARITHMETIC TESTS WITH MARKS IN THE CASE oF 
REPEATERS 
Arithmetic tests with Sex ~ Bame Followin & a Excess of - 
year year same year 

Marks in arithmetic........... Boys | +.69 + .02 (+ .67) 

Girls | +.30 + .29 (+.01) 
Average marks............... Boys | +.22 + .26 (— .04) 

Girls | +.20 + .38 (— .18) 
Marks in reading............. Boys | +.87 0 (+ .87) 

Girls | +.09 +.10 (—.01) 
Marks in conduct............. Boys | +.27 — .20 (+ .47) 

Girls — .06 +.15 (—.21) 
Marks in effort............... Boys | +.53 +.19 (+ .34) 

Girls — .04 0 (— .04) 
RS el ee a ae eee Boys | (+.52) (+ .05) (+ .46) 

Girls | (+.10) (+ .18) (— .09) 

















study is, it tends rather strongly to support the tentative conclusion 
that a maladjustment, more serious for boys than for girls, is the cause 
of the conditions observed. The greatest amount of maladjustment 
would be expected, in general, at the time when the pupil is judged 
to be failing and is required to repeat a grade, and that is the time 
when the differences here under consideration are greatest. Appar- 
ently the teachers’ judgments about the failures of girls were more 
independent of this adjustment factor than were their judgments of 
the failures of boys. This is consistent with the fact that the correla- 
tions of intelligence and marks were greater for girls than for boys. 


THE CONSTANCY OF ACHIEVEMENT 


If some factor such as pupil-teacher or pupil-classroom adjustment 
or instructional proficiency were not significantly involved, and if 
teachers’ marks were distinctly affected by knowledge of achievement 
test results, it would be expected that achievement criteria would 
have a constancy approaching the reliability of the standard tests. 
But the constancy of the achievement criteria is low. 

For the determination of this, the following correlation method 
was adopted. The first-year data were entered on the vertical axis 
of the correlation table with the second-year data on the horizontal 
axis, the second-year on the vertical with the third-year on the hori- 
zontal, the third-year on the vertical with the fourth-year on the 
horizontal, and the fourth-year on the vertical with the first-year 
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on the horizontal. Thus each year’s data were entered twice and no 
one year was weighted more than another. The fourth-year-first-year 
data, which were first recorded separately, proved to be of no different 
order than the rest of the data. 


TaBLE V.—CoNSTANCY OF ACHIEVEMENT OVER A PERIOD OF YEARS: PEARSON 
PRODUCT-MOMENT CORRELATIONS, YEARS 1 witH 2, 2 wiTH 3, 3 wiTH 4, 4 








WITH 1 

Sex Marks Tests 
EEE hacks 06 nk wR Oe a cane soe een Boys +.49 + .02 +.49 + .03 

Girls +.51 + .02 +.43 + .03 
eS fe hs oe 8 oo ee Boys +.35 + .03 +.49 + .03 

Girls +.44+ .02 +.44 + .03 
iS iinhiy don etc a bere ek Rackets Boys | +.42 + .02 

Girls +.47 + .02 
ss hand ce ane 6 heeebanda beens Boys +.45 + .02 

Girls +.43 + .02 
hai os sheds dew eaoee eee eee eea Boys | +.47 + .02 

Girls +.38 + .02 
ES ee ee ee eee ee eee Boys | (+.44) (+ .49) 

Girls | (+.45) (+ .44) 














The resulting correlation coefficients, reported in Table V, are 
much lower than the reliability coefficients of achievement tests, and 
the actual scatter diagrams, of course, show this inconstancy of 
achievement much more impressively. Pupils scoring in reading 
tests at some time as high as +1.5 sigma, for instance, scored at other 
times over a range from —1.0 sigma to +3.5 sigma. Again, pupils 
scoring at some time as low as —1.5 sigma scored at other times from 
—2.5 sigma to +1.0 sigma. The “scatter” for arithmetic and for 
all marks was no less than this. 


CORRELATIONS OF INTELLIGENCE TESTS WITH MARKS 


The differences between the ‘‘same year”’ and the “following year”’ 
correlations of marks with achievement tests suggested a similar study 
of the correlations of marks with intelligence tests. The intelligence 
tests used were, for group testing, the Dearborn General Examinations 
A and C, the Otis Primary Examination A, and the Myers Mental 
Measure, and for individual testing, the Stanford-Binet test. The 
group tests were given at the same time as the achievement tests, in 
November, and the Stanford-Binet testing continued over a somewhat 
longer period of time, beginning in November. The average numbers 
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of intelligence tests per pupil were: For the Stanford-Binet, boys 1.3 
and girls 1.3; for the Dearborn test, boys 3.1 and girls 3.1; for the Otis 
test, boys 1.5 and girls 1.4; for the Myers test, boys .6 and girls .7. 
Every individual had at least one Binet test and two group tests. 

Since the Binet tests were given, of course, in isolation and in a 
situation almost entirely distinct from the classroom situation, whereas 
the group tests were given to all members of the usual class unit 
simultaneously, it seemed best to determine separately the correlations 
between marks and the Stanford-Binet tests and between marks and 
the group tests. In neither the individual nor the group testing were 
any of the standard tests of intelligence or achievement given by the 
classroom teacher, so that factor of the classroom situation is ruled out 
entirely. The intelligence criterion used was the IQ. All the Binet 
1Q’s were entered separately in one series of correlation tables, and all 
the group test IQ’s together in a distinct series of tables. As in the 
correlations of achievement tests and marks, the data were excluded 
for all years when the pupil was accelerated or retarded, or was repeat- 
ing a grade, and for all years immediately following the skipping of a 
grade. Each correlation covers the appropriate data for all four years 
of record. 

Table VI reports, in the four successive columns, the correlations 
of marks in June with Binet IQ’s in the same school year, with Binet 
IQ’s in the following school year, with group test IQ’s in the same 
school year, and with group test I1Q’s in the following school year. 
This table and Table VII are to be read in the same way as Tables 
I and II. 

It will be observed that in fifteen of these twenty cases the ‘‘same 
year”’ correlations are lower than the ‘‘following year’’: correlations, 
whereas for the achievement tests in eighteen of the twenty cases the 
‘“‘same year” correlations were higher than the “following year” 
correlations, and that these present differences not only are somewhat 
less consistent but also are so small as to be in most instances clearly 
unreliable. There is, then, merely a suggestion of a tendency for 
marks in June to correlate more highly with intelligence tests of the 
following school year than with intelligence tests of the same school 
year. 

This again makes it seem still more improbable that the excess of 
the ‘‘same year” over the “‘following year”’ correlations of achievement 
tests with marks was dué to the teachers’ knowledge of test results and 
the effects of any such knowledge upon the marks. If such was the 
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case it seems most likely that the intelligence test results would also 
correlate more highly with the ‘“‘same year’ marks than with the 
“following year’ marks. 


TaBLeE VI.—CorrReELATIONS OF MARKS IN JUNE wiTH IQ’s or THE SAME SCHOOL 
YEAR AND OF THE FOLLOWING Scuoot YEAR 














Binet I1Q’s in November Group test I[Q’s in November 
Of the Of the 
Teachers’ marks in June Sex Of the same following Of the same following 
school year school year 
school year school year 
r PE r PE r PE r PE 
Reading...................| Boys | +.39 +.03 | +.45+.04 | +.29+.02 | +.26 +.02 
Girls | +.52+.03 | +.53+4.03 | +.33 +.02 | +.39 +.02 
EERE + .44 + .03 + .54 + .04 + .26 + .03 + .36 +.02 
Girls | +.44+.03 | +.56+.04/ +.30+.02 | +.41 +.02 
Bn intcectstcscetnee Boys | +.47+.03 | +.52+.04 | +.40+.02 | +.35 +.02 
Girls +.48 +.03 | +.49+.04/| +.31+.02 | +.28 +.02 
Conduct...................| Boys | +.18 +.04 | +.23 +.06 | +.11+4.03 | +.15 +.03 
Girls +.25 +.04 | +.24+.04| +.06+.03 | +.11+.03 
Effort.....................-| Boys | +.30+.04 | +.37 4.04 | +.22 4.02 | +.24+4.02 
Girls | +.40+.03 | +.36+.04 | +.21+.03 | +.30 +.02 
a i ees dial Boys | (+ .36) (+ .42) (+ .26) (+ .27) 
Girls | (+ .42) (+ .44) (+ .24) (+ .30) 




















TaBLE VII.—DIrFeRENCES IN CORRELATIONS OF MARKS IN JUNE WITH IQ’s OF 
THE SAME SCHOOL YEAR AND OF THE FOLLOWING ScHOOL YEAR 




















| Binet IQ's Group test [Q’s 
Teachers ee Sex Excess of | Excess of 
in June “‘enme geas” | Mean The excess “sense year” | Mean The excess 
rs divided by As divided by 
over following) PE ae on over “following; PE nas 
| year”’ year”’ 
Reading........ Boys — .06 .035 1.7 + .03 .02 1.5 
Girls — .01 .03 3 — .06 .02 3.0 
Arithmetic... .. Boys —.10 .035 2.7 —.10 .025 4.0 
Girls —.12 .035 3.4 —.1l .02 5.5 
Average....... .| Boys — .05 .035 1.4 + .05 .02 2.5 
Girls —.01 .035 3 + .03 .02 1.5 
Conduct....... Boys — .05 .05 1.0 — .04 .03 1.3 
Girls +.01 .04 .25 — .05 .03 1.7 
RE: Boys — .07 .04 FY — .02 .02 1.0 
Girls + .04 .035 1.1 — .09 .025 3.6 
Means.........| Boys — .066 .039 1.70 — .016 .023 .70 
Girls — .018 .035 51 — .056 .023 2.43 
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There is a slight suggestion in some of the tables that the differences 
between the ‘‘same year” and the “following year”’ correlations are 
greater in arithmetic than in the other phases of achievement. This 
difference, if in fact it exists, is perhaps due to greater reliability of the 
tests or marks or both in arithmetic, or to the fact! that the summer 
losses in arithmetic are greater than in reading, or possibly to the fact 
that arithmetic is the most difficult subject in the grades under con- 
sideration in this study. 


SUMMARY 


The subjects were five hundred three boys and four hundred fifty- 
five girls, registered over the period of four years in grades one to six, 
but chiefly the first four, in sixteen different schools and with one 
hundred eighty different teachers. 

All achievement criteria were transmuted into their sigma values 
in their respective grade distributions, so that the marks of different 
teachers were approximately comparable and the achievement test 
performance was precisely comparable at the grade level. The 
correlations of marks in June with achievement test performance in 
the preceding November, and with achievement test performance in the 
following November, were determined, all years of record being first 
taken together. Subsequently a similar study was made year by year, 
and another in the case of pupils repeating the second grade. The 
constancy of educational achievement was determined. The correla- 
tions of marks in June with IQ’s for Stanford-Binet tests and IQ’s for 
group tests of intelligence in the preceding November, and with these 
IQ’s in the following November, were also calculated. In all cases 
except the special study of ‘‘repeaters’’ the data for years when a grade 
was being repeated, and for years immediately following the skipping 
of a grade as well as for years when pupils were accelerated or retarded, 
were excluded from the computations. 

1. In the first part of the study, in eighteen of the twenty cases the 
correlations of marks in June with achievement tests in November of 
the same school year (seven school months earlier) were higher than the 
correlations of marks in June with achievement tests in November of 
the following school year (three vacation months and two school 
months later, when achievement would be expected to have reverted, 
after the summer loss, to just about the level of the preceding June). 
These differences appeared to be not only consistent but also in general 





1 See Nelson’s study of the effects of the summer vacation. 
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fairly reliable, since in half of the cases the differences between the 
“same year’ and the “following year” correlations were from three to 
nine times as great as the mean of the two Probable Errors of the 7’s. 

2. These differences were very much greater for boys than for girls, 
six of the ten differences for boys being four or more times as great as 
the mean of the PE’s, whereas such an amount of difference appeared in 
only one of the ten cases for girls. 

3. There are varied and conclusive evidences that these same boys 
were much more seriously maladjusted in school than were these same 
girls. 

4. The study of correlations over successive years in general con- 
firmed the findings summarized in 1, it gave a slight suggestion that 
the differences mentioned in 1 tend to increase with the amount of 
schooling for boys and to decrease for girls, it showed one year when the 
difference were reversed in part, and it did not reliably show the sex 
differences mentioned in 2. This part of the study was less reliable 
than most other parts. 

5. The differences mentioned in 1 and 2 are very much more 
marked in the case of pupils repeating grades, whose maladjustment 
would be expected to be much greater. This part of the study also is 
somewhat unreliable. 

6. Achievement, whether measured by standard tests or teachers’ 
marks, has a degree of constancy, over a period of four years, repre- 
sented by a self-correlation of about +.45, which is distinctly lower 
than the reliability coefficients of standard tests. Obviously, the 
variations from year to year are great, and pupil-teacher adjustment 
may well be an important causal factor. 

7. The correlations of marks with individual-test and group-test 
IQ’s showed only a slight suggestion of a difference, statistically unreli- 
able, between the ‘‘same year” and the “following year” correlations, 
and this suggested difference was the reverse of that for marks and 
achievement tests, the “same year’ correlations being in this case 
lower than the “following year’’ correlations. 


CONCLUSIONS 


The reason for the excess of the “‘same year” over the “following 
year” correlations of marks with achievement tests is not established 
by this study, but the tentative conclusion is that the cause is to be 
found in the pupil-classroom and especially the pupil-teacher adjust- 
ment or maladjustment, and particularly, the maladjustment between 
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women teachers and boys. This adjustment factor varies from year 
to year, but in any one year, it is assumed, it affects not only the 
teachers’ marks but also the performance in standard achievement 
tests so that these two measures tend to approach the same level. 
Thus the difference of this adjustment factor from year to year causes 
the “following year’ correlations to be lower than the “‘same year’’ 
correlations. Several studies!” show significant effects of other types 
of ‘‘adjustment”’ and of such factors as praise and reproof upon achieve- 
ment test performance, so that an assumption of this kind is not with- 
out justification on general grounds. 

The evidence here advanced suffers from the defects of a purely 
statistical study, but it is at least sufficient to suggest further and more 
conclusive investigations of this matter, which, it may be hoped, will 
be undertaken by someone who has access to records which are more 
complete and cover a longer period of years than those now available 
to the writer. 

The problem relates, it seems, not so much to achievement test 
reliability as to the extent of the effect of pupil-teacher and pupil- 
classroom adjustment upon achievement. 





1 Reviewed by A. M. Jordan in his ‘“‘ Educational Psychology.”” New York: 
Henry Holt & Co., 1928, pp. 97-121. Bibliography, pp. 124-125. 

2 Courtis, S. A.: ‘‘Why Children Succeed.” Detroit: Courtis Standard Tests, 
1925, pp. 108-139. 
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A PRELIMINARY REPORT ON A STUDY OF 
THE INTERRELATIONSHIP OF CERTAIN 
APPRECIATIONS'! 


HERBERT A. CARROLL 
University of Minnesota 


Probably no psychologist would maintain that the ability to 
appreciate exists as an intellectual or emotional entity, functioning 
equally well whether a picture, a book, or a musical composition serves 
as a set of stimuli. The question is not one of the existence or non- 
existence of a general faculty, but rather of the extent to which specific 
appreciations are correlated. It is a known fact that in achievement 
in the traditional school subjects coefficients of correlation tend to be 
rather high, probably as a result of the influence of the general factor 
of intelligence. Is there the same, or a similar, factor underlying the 
degree to which individuals are capable of appreciating, or are the 
different kinds of appreciations so far removed from one another that 
they must be considered as being wholly unrelated? The problems 
introduced even by this single question are staggering in their difficulty 
and scope. ‘To devise instruments which will measure appreciation is 
in itself a most complicated task. To make careful and scientific use 
of such instruments after they have been devised is an undertaking 
beset with dangers. The need, however, is urgent. This paper is a 
preliminary report of an investigation, now being conducted at the 
University of Minnesota, on the interrelationship of the abilities to 
appreciate art, literature, and music. 


MEASURING INSTRUMENTS 


The capacity to appreciate art was measured by the Meier-Seashore 
Art Judgment Test; to appreciate music, by the Hevner Music 
Appreciation Test; and to appreciate prose literature, by the Carroll 
Prose Appreciation Test. Table I presents data concerning the 
reliability of these instruments. 

Though the above coefficients are so low as to make dogmatic 
statements with respect to individual scores a hazardous proceeding, 
they are high enough to indicate that the tests possess sufficient 





1 The writer is indebted to the Committee on Educational Research, University 


of Minnesota, for initiating and supporting the series of studies in art education of 
which this is one. 
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TaBLE I.—RELIABILITY OF TESTS OF APPRECIATION OF ART, Music, 
AND LITERATURE 








Name of test N r PE, 
Meier-Seashore art judgment.................2..000008- 100 | .71 | .033 
Hevner music appreciation... ......... ccc cece cc ccceces 200 .68 | .030 
EE EE 467 .71 | .016 














reliability for group comparisons; and it is with groups that this study 
is primarily concerned. 

This is not the place to present a detailed analysis of the validity 
of the tests used, as such discussions have been published elsewhere. 
The reader is referred to the following sources: (1) Meier, N. C. and 
Seashore, C. E.: ‘‘The Meier-Seashore Art Judgment Test, Examiner’s 
Manual.” Iowa City, Iowa. ‘Bureau of Educational Research and 
Service.”’ State University of Iowa, 1930. (2) Hevner, K.: Tests for 
Esthetic Appreciations in the Field of Music. The Journal of Applied 
Psychology, Vol. XIV, No. 5, October, 1930, pp. 470-477. (3) Carroll, 
H. A.: A Standardized Test of Prose Appreciation. The Journal of 
Educational Psychology, Vol. XXIII, No. 6, September, 1932, pp. 
401-410. Although the prose appreciation test was constructed 
especially for senior high school pupils, it is sufficiently difficult for use 
on the college level. 

These three measuring instruments are by no means perfect, but 
the writer feels that they are the best at present available and that 
they do give at least rough approximations of the extent to which cer- 
tain capacities to appreciate are distributed in a group. If apprecia- 
tion exists at all, it must necessarily exist in some quantity. The tests 
used in this study answer the question “How much?” in quantitative 
terms. 


INTERCORRELATIONS 


Having tested one hundred thirty-three university students with 
the instruments just discussed, correlations were computed in order to 
discover the extent to which appreciation of art and literature, art and 
music, or literature and music tended to vary together. The results 
appear in Table II. 

The correlations in Table II indicate a negligible relationship 
among the three appreciations measured; in fact, the only one of the 
three that is reliable is the .24 between art and literature. Evidently 
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TaBLE II].—INTERCORRELATIONS OF APPRECIATIONS OF ART, LITERATURE, AND 








Mosic 
Appreciation of N r PE, 
i. 2 Uo eke wa dh asus dew cdweeeekeeee 133 .24 | .055 
a a a el eo ys owls pilcitula di i 133 .16 | .057 
PORE TE OLY OT EE 133 .12 | .058 














there is no common factor here, such as is found in English and history, 
which results in a marked tendency for the abilities to vary together. 

When a partial correlation technique is used, eliminating the 
intelligence variable, the findings are but slightly altered. (See 
Table III.) 


TaB.eE III.—INTERCORRELATIONS OF APPRECIATIONS WITH INTELLIGENCE RULED 








OvuT 
Appreciation of N r 
ETO SIDI, LAE IP EOE ET 133 .25 
SoS oe re Sie es doe eae 133 .15 
EE Se Be Dee ae ee ee 133 13 











ART STUDENTS U8. NON-ART STUDENTS 


A majority of the students tested were enrolled in the art depart- 
ment of the College of Education at the University of Minnesota. 
To be exact, of the total number of one hundred thirty-three, ninety 


TABLE I1V.—DIFFERENCES IN CAPACITY TO APPRECIATE ART, LITERATURE, AND 
Music BETWEEN ART STUDENTS AND NON-ART STUDENTS 

















Art Literature Music 
Students M M M 
N | (with | ¢ | N | (with a N | (with a 
PE) PE) PE) 
Me scadd ob ecxekes 90 1103.61 (6.58) 90 | 45.00 8.50) 90 | 66.50 (13.00 
+ .468 + .604 + .924 
Non-art........... 43 | 94.01 (8.32) 43 | 43.20 (|10.09) 43 | 60.41 /|15.93 
+ .855 +1.04 +1.64 
Difference......... a Oi fececl ss TS ie 6.09 
+.975 +1.20 +1.88 
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were art students and forty-three were non-art students. On the 
possibility that a comparison of scores made by art students and non- 
art students on the three appreciation tests might throw some light 
on the problem of the interrelationship of appreciations, a table pre- 
senting such data was drawn up. (See Table IV.) 

It will be seen from this table that the only reliable difference 
between the two groups exists in art appreciation. In this variable the 
difference is approximately ten times the size of its probable error, the 
art students, as would be expected, being superior. While the art 
students maintained an advantage over non-art students in both 
appreciation of literature and appreciation of music, this advantage is 
very slight. In neither instance is the difference between the means a 
reliable one if a critical ratio of four is accepted as being indicatory 
of reliability. It is of some interest, too, that the non-art students are 
consistently more variable. 


COMPARING EXTREME SCORES 


It sometimes happens that, though a coefficient of correlation is 
low, there are a few students at the top or at the bottom in one variable 
who are consistently high or consistently low in all the variables 
measured. To see if this were true with respect to appreciations, the 
ten individuals who ranked highest in art and the ten individuals who 
ranked lowest in art were separated from the main group and their 
percentile ranks in art, literature and music tabulated and compared. 
The results appear in Table V. 


TABLE V.—CoOMPARISON OF PERCENTILE RANKS IN ART, LITERATURE, AND Music 
FOR TEN STUDENTS RANKING HIGHEST AND TEN STUDENTs RANKING LOWEST 
ON THE MEIER-SEASHORE ART JUDGMENT TEST 














Art Literature Music 
Measures M M M 
N | (with o | N | (with o N | (with o 

PE) PE) PE) 
eee 10 | 95.20 {2.86} 10 | 63.00 26.801 10 | 52.00 |16.20 

+ .611 +5.72 +3.46 
A ove dwaea de> 10 | 6.40 (|2.62) 10} 40.00 |32.64) 10 | 40.00 (32.34 

+ .560 +6.97 +6.90 
Difference......... 88.80 23.00 |..... 12.00 

+ .829 +9.02 +7.72 
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Although those who rank highest in art tend to be above average 
in both literature and music, and those who rank lowest in art tend to 
be below average in both literature and music, there is considerable 
overlapping. Moreover, the differences between the means are not 
reliable. The probable error of the difference, however, is, of course, 
very large because of the small number of cases involved. The data 
from this approach show a tendency for those who are superior in 
one appreciation to be superior in others, and for those who are inferior 
in one appreciation to be inferior in others, although this tendency is 
by no means pronounced. 


TaBLE VI.—A COMPARISON OF THE PERCENTILE RANKS IN LITERATURE AND 
Music or Five StupENts Wuo RANKED HIGH ON THE MEIER-SEASHORE 
ArT JUDGMENT TEST 














Student | Art Literature Music 

A | 99.5 36.0 85.0 

B 98.0 99.0 28.0 

C 98.0 53.5 51.5 

D 97.5 77.5 32.5 

E 95.0 81.0 68.0 

NS si eeu ees ene - 97.6 69.4 53.0 
bike d+ 65a kt ed a ceaeens es 95.0-99.5 | 36.0-99.0 | 28.0-85.0 














In Table VI there appears a tabulation of percentile ranks in art, 
literature, and music of those five students of the total number who 
had taken all three tests who were found to rank highest on the 
Meier-Seashore Art Judgment test. The differences between the 


TaBLE VII.—A CoMPARISON OF THE PERCENTILE RANKS IN LITERATURE AND 
Music or Five STupENTs WHO RANKED LOW ON THE MEIER-SEASHORE ART 
JUDGMENT TEST 











Student Art Literature Music 

A 6.5 60.0 40.5 

B 6.0 53.5 3.5 

¢ 4.5 11.5 28.0 

D 3.5 8.5 26.5 

E 6 2.5 8.5 

2. errr er 54 4.2 27.2 21.4 
_ Te arr ee es .5-6.5 2.5-60.0 | 3.5-40.5 
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ranks are striking, although no individual who is in the highest five 
per cent on the art test falls in the lowest quartile on either the lit- 
erature or the music test. This is an indication that if an individual 
possesses a high degree of ability to appreciate art, he is not likely 
to be markedly deficient in his appreciation of literature or music. 
Table VII, which is a comparison of the percentile ranks in art, 
literature, and music of five individuals who stood low on the Meier- 
Seashore Art Judgment Test, indicates that the converse is true: 
Namely, that an individual who stands very low in his ability to 


appreciate art is not likely to be markedly superior in his appreciation 
of literature or music. 


CONCLUSIONS 


If the instruments used in this study have adequately measured 
appreciation, the data gathered indicate strongly that the relationship 
existing among the capacities to appreciate art, literature, and music 
is very slight. Literature and art show a somewhat greater tendency 
to vary together than do literature and music or art and music. 
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SOME THEORIES AND EXPERIMENTS IN THE FIELD 
OF MEMORY: 


MARION E. GRANT 


Baylor College for Women, Belton, Texas 


Philosophical and scientific investigations in the field of memory 
and forgetting have led to the formulation of a variety of theories in 
explanation of their phenomena. Among the more outstanding of 
these theories we find the following: 


The Association Theory. 
The Physiological Theory. 
The Behaviorist Theory. 
The Gestalt Theory. 

. The Pleasure-Pain Theory. 


The first of these, the Association theory, might be briefly stated 
by saying that in regard to events and ideas occurrence together means 
recurrence together. In support of or in place of this theory, many 
writers have postulated a physiological theory which implies that 
memory is, or is the result of, the retention in the nervous system of 
some change resulting from sensory stimulation, this modification 
being retained in such a fashion that, upon appropriate stimulation, the 
original experience is reproduced in sufficiently identical terms to be 
recognized as that experience. 

The Behaviorist theory should be looked upon probably as a 
particular type of physiological theory, since the behaviorist holds that 
memory is synonymous with habit, which may be verbal habit, manual 
habit, or any other conceivable form of physical reaction, somatic or 
visceral. Thus memory is subject to the same laws of learning as have 
been considered adequate (by the behaviorist) in accounting for habit- 
formation, namely, the laws of frequency and recency. 

The Gestalt theory explains memory in terms of configurations, 
emphasizing particularly the qualities of the perceptual field which are 
essential to registration and therefore to recall. Memories are thus 
regarded as pictures of past organization and where organization is 
natural and strong memories are accurate and complete, but where 
organization is lacking it must be supplied (for example, the grouping 
or accenting of nonsense syllables) in order that memory may occur. 


FPrPP> 





1An abstract of a dissertation presented in partial] fulfilment of the 


requirements for the degree of Doctor of Pedagogy in the University of 
Toronto, 1931. 
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According to the Pleasure-Pain theory, memory is regarded as a 
function of the affective quality of the original experience, or more 
specifically, we retain the pleasant and forget the unpleasant. 

Since it is evident that no one of these theories alone presents a 
wholly adequate explanation of the phenomena of memory, it may be 
deemed advisable to postulate a combination theory. Thus it seems 


to the writer that any complete theory of memory must fulfill the 
following conditions: 


1. It must have a physiological basis. 


2. It must offer an explanation of the function of the stimulus in registration 
and recall. 


3. It must account for the phenomena of forgetting, including both the fading 


- and the alteration of memories. 


In regard to the first point, it seems probable that memory should 
be regarded as a function of the entire organism, different aspects of 
which assume the dominant réle at different times. For instance, 
neural activity (of the central nervous system) may predominate at the 
time of impression and recall, in which case we have what Bergson and 
McDougall would call true memory, whereas if muscular activity 
predominates and neural activity is at a minimum, involving very 
direct routes from stimulated sense organs to organs of response, we 
have habit memory. If, however, glandular and related activities 
predominate, we have emotional memory, pleasant or unpleasant 
depending perhaps on the division of the autonomic system which 
is functioning to promote these activities. On no account, however, 
is it to be assumed that when one aspect predominates the others are 
completely in abeyance. Memory, as already stated, is a function of 
the whole organism. From time to time, different aspects play the 
leading réle, but it is probable that at all times there is some activity 
(however slight) of all these aspects, any one of which may take the 
initiative in setting up the memory experience. 

We have next to attempt an explanation of the function of the 
stimulus or situation in registration and recall. To begin with, obvi- 
ously the situation to be remembered must first be perceived by the 
subject. Putting it axiomatically, we may say, ‘‘no perception, no 
retention.”” The best condition for perception and therefore for 
retention may be described in a borrowed term as a state of “rapport” 
between the observer and the field observed. That which is perceived 
is that which fits in with the present dynamical organization of the 
organism. For instance, a pedestrian, whose entire energy is bent 
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upon catching a train already due, is not likely to observe or later 
remember a beautiful bed of tulips which he passes en route, but will 
rapidly react to an approaching car driven by a friend who might be 
induced to drive him to the station. The bed of tulips simply does not 
fit in with the picture, in Gestalt terminology is not a part of the 
configuration. Organization in the field, then, already emphasized as a 
part of the Gestalt theory, seems to be especially significant in explain- 
ing this aspect of the memory problem. 

In accounting for the process of forgetting, the explanations on 
which we may draw are: The principle of disuse, fatigue, the Gestalt 
laws of accentuating and levelling and the influence of verbalization. 
Certainly many bits of information escape us simply because we fail to 
make use of them. Habit memories, on account of the great amount of 
overlearning which they receive, are undoubtedly the last to become 
the victims of the forgetting process. Again, in states of extreme 
fatigue one may almost forget his own name, but adequate rest will 
restore normal functioning. We realize further that all aspects of an 
experience do not fade with equal rapidity. The minor details seem to 
go first, the general impression or form, or perhaps some particularly 
prominent feature (cf. Alice’s ‘‘grin without the cat’’) standing out 
clearly long after the rest has faded. This accentuating of a given 
aspect of the experience may be due to its emotional intensity, or to the 
extent of verbalization occurring subsequent to the original incident. 
Thus the more we talk to ourselves about an event the more vividly 
it is recalled, though modification of it may take place in the process. 

Briefly then we have shown that memory must have a physiological 
basis, that the registration and reproduction of experiences are to be 
looked upon as dependent largely upon the degree of organization 
present in the perceptual field and its relation to the organism at the 
time, and that forgetting is a somewhat selective process which takes 
place, to some extent at least, in accord with the law of disuse, the 
degree of fatigue and the Gestalt laws of accentuating and levelling, the 
former being conditioned to a very large degree by verbalization occur- 
ring subsequent to the initial experience. 


EXPERIMENTAL STUDIES 


The experimental investigations in the field of memory and forget- 
ting may be grouped under the following heads: 


1. Transfer Studies. 
2. Studies on the Influence of Type and Arrangement of Material. 
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3. Studies on the Influence of Methods of Learning. 

4. Studies on the Relation of Intelligence to Retention. 

5. Studies on the Shape of the Learning and Forgetting Curves. 
6. Studies of Inter-relationships between Memories. 


1. The more carefully executed transfer studies, the work of Sleight 
for example, in the main lend support to the view that memory is not 
one function but many; that there is no such thing as a general memory 
faculty, but a number of specific memories more or less closely related. 
This relationship is to be found mainly in similarities of procedure in 
memorizing and in partial identity of content. This view is also 
supported by studies of inter-correlations (No. 6 above). The correla- 
tions reported vary all the way from very low to very high coefficients, 
but all are positive, thus indicating that memory functions are related, 
but not sufficiently closely to warrant the assumption that there is a 
single unitary faculty of memory. 

2(a). In regard to the second group of studies, we note first that, 
with one exception (Miss Key’s study), investigators agree in finding 
complex material, or material with complex background, more easily 
recalled than simple material up to a certain not-very-clearly-defined 
limit. (6) Studies on the influence of feeling-tone present somewhat 
conflicting results, some favoring the view of the greater persistency 
in memory of pleasant materials, others finding no significant difference 
in retention of pleasant and unpleasant material, but practically all 
agree that experiences having a definite feeling-tone are better retained 
than those which might be described as neutral. (c) The influence of 
serial position in learning is sometimes considered in connection with 
other studies, with the usual conclusion that initial and final positions 
in series are more influential than intermediate positions, though 
instructions not to think back over a series tend to reduce the potency 
of primacy. (d) One investigator, Guilford, has found the presence of 
definite form or organization an aid to learning. 

3(a). Most studies agree that distributed learning is more effective 
than concentrated learning, but the superiority is greater for delayed 
than for immediate recall (7.e., cramming may be effective for the 
following day, but not for the following week). (6) Findings relative to 
part and whole methods of learning are conflicting, but in view of the 
evidence presented, a modified whole method seems most desirable 
since it combines the advantages of both methods, preserving the unity 
secured through the whole method with the necessary extra emphasis 

on harder sections which the part-method provides. (c) Studies 
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agree in finding the recitation method of learning superior to the read- 
ing method, though they differ in the distribution of recitations and 
reading favored. (d) Several recent investigations tend to contradict 
the view of Ebbinghaus regarding overlearning in finding that the law 
of diminished returns is operative in the process. That is to say, 
successive repetitions, while adding something to the permanence of the 
impression, become less and less potent and are less effective with 
bright than with dull subjects. (e) Several investigators have studied 
the effect of mode of presentation, with but little agreement in their 
findings. Lyon, in an extensive study, lends support to the view that 
visual presentation is more effective than auditory presentation, but 
that a combination of the two is better than either alone. (f) Few 
experiments have been conducted relative to the problem of the influ- 
ence of time of day on learning, but Gates, in a careful study, finds the 
hour of 10 A. M. to 11 A. M. the period of maximum efficiency. (gq) 
Investigations on the relation of speed of learning to retention are 
almost unanimous in finding the rapid learner the more efficient; some 
experimenters find his efficiency more marked with difficult material. 

4. The coefficients of correlation between memory tests and various 
measures of intelligence are all positive, but show a wide range, from 
.10 to .93. While the differences are in some cases slight, the coeffi- 
cients for sense material are higher than those for nonsense material. 
There is some evidence that correlations with immediate memory are 
higher than those with delayed recall. 

5. From studies on the nature of the curve of retention, we find 
that its shape varies with the nature of the material, including its 
length and difficulty, the degree of learning or over-learning, the number 
and distribution of repetitions, the method of measurement and the 
individuals tested. There is some evidence that, as Ebbinghaus 
indicated in his pioneer experiment, forgetting proceeds very rapidly at 
first, more slowly later on, but the initial loss is less rapid with meaning- 
ful than with nonsense material. 


THE PRESENT STUDY 


The object of the present study was to investigate certain aspects 
of memory and forgetting and their relation to intelligence. 

Materials.—For this purpose the Detroit Advanced Intelligence 
Test and the following memory tests were used: Test I—Nonsense 
syllables, Test I1—Letter-word pairs, Test I1I—Poetry, Test IV— 
Words, Test V—Numbers, Test VI—Figures, Test VII—Prose 
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description, Test VI1[—Unrelated words, Test [X—Related words, 
Test X—Prose description. 

Subjects.—The subjects were mainly Baylor College Students, with 
a small group of Academy students in the second group. The groups 
numbered approximately 161, 121 and 127 tested on Tests I-III, 
IV-VII, and VIII-X, respectively. 

Procedure.—With the memory tests presented visually, the pro- 
cedure was practically the same for all experiments and may be out- 
lined as follows: 

1. A five-minute study period on test material (three in Test VII). 

2. A test of immediate recall (Trial 1). 

3. Three tests of delayed recall: 

(a) After twenty-four hours (Trial 2). 


(6) After one week (Trial 3). 
(c) After five weeks (Trial 4).- 


With the auditory presentation (Tests VIII-X), the material of 
each test was read aloud three times by the experimenter, following 
which the procedure was the same as that outlined above, with the 
exception that the interval for Trial 4 was four weeks instead of five. 

We shall now proceed to a consideration of some of the more signifi- 
cant findings of this investigation. 


I. THE RELATION OF INTELLIGENCE TO FORGETTING 


This problem was considered first by calculating coefficients of 
correlation between scores on the Intelligence Test and scores on the 
various memory tests at the different intervals indicated above. The 
results are shown in Table I. 


TABLE I1.—CoRRELATIONS BETWEEN INTELLIGENCE TEST ScORES AND ScOREs 
ON Memory TEsTs AT DIFFERENT INTERVALS 








Test Trial 1 Trial 2 Trial 3 Trial 4 
: I 32 (.046) 36 (.049) .31 (.051) .25 (.053) 
bi II 41 (.045) .85 (.047) 40 (.077) .17 (.052) 
# III _59 (.086) .57 (.037) .57 (.037) 49 (.054) 
IV 42 (.050) .20 (.058) .20 (.058) 10 (.060) 
V 07 (.059) 82 (.054) 26 (.056) 08 (.059) 
VI 12 (.058) 23 (.056) 23 (.056) 19 (.058) 
VII 24 (.067) .45 (.057) 29 (.065) .39 (.060) 
VIII .37 (.052) 13 (.058) 04 (.060) 05 (.060) 
1X .26 (.057) .82 (.054) .23 (.057) .29 (.055) 
x 58 (.040) 53 (.048) .57 (.040) 51 (.045) 
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In general these results show positive correlation between intelli- 
gence and retention, the correlation being highest in the case of the 
most meaninful material, the poetry and the two prose selections, 
Tests III, VII and X, respectively. There seems, also, to be some 
tendency for the correlations to decrease with an increase in the time 
interval, correlations with scores on Trial 4 being lower in every case 
than those for Trial 2. In four of the tests the coefficients for Trial 4 
are higher than on Trial 1, but in two cases this is probably to be 
accounted for by the large number of undistributed scores on Trial 1 
(Tests II and VII), and in another case (Test V) the difference is very 
slight and probably insignificant. 

In comparing the results for Tests IV, VIII and IX, all of which 
were composed of lists of words, it is evident that on Trial 1 the order of 
highest correlation with intelligence is: Test IV, words with definite 
feeling-tone; Test VIII, Unrelated words; Test IX, Related words. 
This result is contrary to expectation, since one might assume that 
memory for related words would show closer relation with intelligence. 
On Trial 4, however, we do find distinctly higher correlation for the 
related material than for either of the other two lists. These facts 
suggest that organization of material benefits the more intelligent 
learner more than the less intelligent and that this difference persists 
in retention at longer intervals, whereas with material lacking such 
organization or unification (Tests IV and VIII) the superiority of the 
more intelligent group decreases with the passing of time as shown by 
the very low coefficients on Trial 4, .10 and .05. 

The materials showing the most consistently high coefficients of 


correlation with intelligence are prose description, Test X, and poetry, 
Test ITI. 


Il. THE RELATION OF INTELLIGENCE TO ACCURACY OR INACCURACY 


The procedure in connection with this problem was to correlate 
Intelligence Test scores with the total number of errors made by each 


TaBLE I].—CorRELATIONS BETWEEN INTELLIGENCE Test ScoRES AND NUMBER 


or ERRORS 

Trest 
I — .05 (.056) 
II — .12 (.054) 
IV — .07 (.060) 
V — .28 (.055) 
VI — .26 (.056) 
VIII — .01 (.060) 


IX — .19 (.058) 
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subject on all trials of each of the tests except Tests III, VII and X. 
The results are shown in Table IT. 

We note that the correlations are very low, but negative, thus 
indicating a very slight tendency for the brighter individuals to make 
fewer errors than the dull ones. The highest negative correlations are 
obtained with the tests which showed the lowest correlations with 
intelligence test scores, Tests V and VI, numbers and figures. Appar- 
ently the making of errors on these tests was slightly more symptomatic 
of low intelligence than making errors on the other tests. Figure 1 
shows the distribution of errors made on these two tests by pupils in 
the upper and lower twenty per cent in intelligence, but even here there 
is much overlapping. 
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Upper twenty per cent in intelligence. 


wore Lower twenty per cent in intelligence. 
Fria. 1.—Intelligence and errors. 


III. THE RELATION OF LEARNING TO FORGETTING 


It has already been noted that several earlier investigators have 
found high correlation between speed of learning and retention. The 
present study lends further support to this point of view. The evi- 
dence is presented in Table III, showing correlations between Trial 1, 
which may be looked upon as a test of learning, and each of the other 
trials. : 

An examination of these figures reveals the fact that, with the 
exception of those for Test V, the coefficients are decidedly substantial 
orhigh. Within the limits of this investigation, then, it is evident that 
there is a significant positive correlation between learning and reten- 
tion, in other words, that those who learn well retain well and that 
those who are poor learners are also inefficient in remembering what 
they learn. Tests with poetry and prose description, meaningful 
material, show the highest correlations. We observe also that in the 
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main there is a decrease in size of coefficient with increase of time 


interval. 


TaBLE III.—CoRRELATIONS BETWEEN TRIAL 1 AND Eacu or TRIALS 2, 3, AND 4 














Test Trial 2 Trial 3 Trial 4 
I .37 (.049) .60 (.036) .47 (.043) 
II .73 (.025) .79 (.034) .58 (.036) 
III .84 (.017) .79 (.021) .69 (.029) 
IV .76 (.025) .75 (.026) .55 (.041) 
V .38 (.050) .12 (.058) .09 (.060) 
VI .81 (.020) .71 (.029) .57 (.040) 
VII .79 (.026) .70 (.038) .45 (.057) 
VIII .54 (.043) .51 (.045) .42 (.050) 
IX .45 (.048) .65 (.035) .56 (.041) 
xX .90 (.011) .88 (.013) .63 (.036) 








IV. THE INFLUENCE OF POSITION IN SERIES ON RETENTION 


In tests where such a study was possible, tabulation was made of 
the number of times each item was correctly reproduced on Trial 4 
of the tests, with the result that position in series was found to be an 
influential. factor in recall in Tests I, II, IV, VIII and IX. Position 
at the beginning of a series was found to be more advantageous than at 
the end, the first item in a series being well retained on all tests, though 
not invariably best retained. It is found, however, that the item best 
retained never came lower than fourth in the series. In Tests II and 
IV evidence was obtained that the affective quality of the material was 
an influential factor in recall in that words with a pleasant feeling-tone 
were better retained than those of an unpleasant quality. 


V. CURVES OF RETENTION 


A careful examination of the curves of retention resulting from the 
plotting of the scores obtained gives us the following conclusions: (1) 
The shape of the curve of retention varies with the type of material 
retained and the degree of overlearning. (2) Seven of the curves 
examined conform somewhat to the Ebbinghaus type of curve, the 
most abrupt descent being observed in the case of lists of words and 
numbers. (3) Three of the curves, those for Tests II, VI, and VII, 
definitely do not conform to the Ebbinghaus type. (4) Related verbal 
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material shows a lower per cent of loss than either unrelated material or 
material having a definite feeling-tone. 


VI. THE RELATION BETWEEN MEMORIES 


Calculations of inter-correlations between Test scores on Trial 4 
have been made for the three series of tests, with the results shown in 
Tables IV, V, and VI. 


TaBLE IV.—INTER-CORRELATIONS BETWEEN SCORES ON TRIAL 4 








Series I 
Test Test I Test ITI Test III 
oe See ere 01 (.06) .39 (.05) 
II aoe. =. ' @edbeses .28 (.05) 
III .39 (.05) .28 (.05) 














TABLE V.—INTER-CORRELATIONS BETWEEN SCORES ON TRIAL 4 








Series II 
Test Test IV Test V Test VI Test VII 
2S ea rrres 25 (.06) 24 (.06) .44 (.06) 
V ae Barrer .12 (.06) .29 (.07) 
VI 24 (.06) 2 BE) eee eege .35 (.07) 
VII 44 (.06) 29 (.07) .35 (.07) 

















TABLE VI.—INTER-CORRELATIONS BETWEEN SCORES ON TRIAL 4 








Series III 
Test Test VIII Test IX Test X 
sere ee .49 (.05) .38 (.05) 
IX | i errr ee ee .68 (.03) 
x .38 (.05) .68 (.05) 














| 


It will be noted that these correlations are all positive and in some 
instances fairly high, but obviously far from perfect. They certainly 
present a wide range in size, from .01 to .68. It is interesting to find 
that the highest coefficients are those obtained with material pre- 
sented in an auditory fashion, but we have not sufficient evidence to 
indicate whether this is due merely to the method of presentation or to 
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the nature of the materials used. The former seems the more plausi- 
ble explanation, however, since the materials used in Series III were 


very similar to some of those employed in the other tests presented 
visually. 


VII. THE RELATION OF FORM TO RETENTION AND INTELLIGENCE 


This part of the study has to do with the materials of Test VI. 
The first problem was to seek to discover which forms were best 
learned and retained. Figure 2 gives the findings in this connection 
and should be read as follows: No. 1, the division sign, was best retained, 
No. 2, next, and so forth. Examination and interpretation of these 
results indicates that the nine best remembered figures constitute com- 
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plete wholes, in the Gestalt terminology, not merely parts as some of 
the later figures do, the semi-circles and quadrant for example. The 
fact of their better persistence would be accounted for by the con- 
figurationists no doubt on the basis of their greater stability, symmetry 
and organization. Another explanation may be found in the compar- 
ative ease of naming the various forms and figures, thus emphasizing 
again the importance of verbalization in learning and retention. 

An investigation was also conducted involving a comparison of 
figures retained best by subjects in the highest and lowest twenty 
per cent in intelligence respectively, in order to discover any possible 
difference in form preference in the two intelligence groups. The 
method used involved the tabulation of the number of times each 
figure was correctly reproduced on each trial by pupils in the upper and 
lower groups. The rank of each figure in each group was thus secured 
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and these ranks were used for the calculation of coefficients of correla- 
tion by the Spearman rank-order method. Table VII shows the 
coefficients thus obtained. 

From the size of the coefficients we are justified in concluding that 
there are no outstanding form preferences characteristic of high or of 
low intelligence within the particular group studied. In other words, 


TaBLeE VII.—RANK OF FIGURES FOR UpreR GROUP WITH THEIR RANK FOR LOWER 


GROUP 
TRIAL 
1 .74 
2 81 
3 . 66 
4 77 
Totals .84 


in general, there is a significant tendency for forms best retained by the 
superior group in intelligence to be best retained also by the inferior 
group. 

We have presented in this report the more outstanding theories in 
the field of memory, together with a survey of the significiont experi- 
mental findings on this topic. The conclusion most clearly indicated, 
however, is the need for further experimentation. 
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THE MEASUREMENT OF TRAIT DIFFERENCES IN 
RELATION TO THEIR IMPORTANCE FOR 
CLASSIFICATION OF PUPILS 


ETHEL L. CORNELL 
Educational Research Division, New York State Education Department 


How shall we measure individual idiosyncrasy so that it conveys a 
meaning as to the range of the teacher’s job? If we measure it only in 
terms of age-norms, we cannot measure the statistical reliability of the 
differences found. If we use z-scores, which are capable of more 
statistical manipulation and interpretation, we get no concrete state- 
ments that are meaningful for the teacher. It does not help the 
teacher to know, for example, that Johnnie Jones’s ability in reading is 
1.2c above the mean of all the ten-year olds in his community, while 
in arithmetic computation he is .4¢ below the mean. What, actually, 
is the difference she has to deal with, from —.4c0 to +1.2c? Is ita 
difference of half a year of attainment, a year, or a year and a half? 

The attempt to transmute z-scores (scores given in terms of sigma 
distances from the mean of the group) into scores given in terms of age 
standards involves some puzzling questions. A child may not be 
variable at all in his abilities in two tests, as judged by his deviation 
from the group mean, but in terms of achievement age he may vary 
considerably from one test to another. For example, a ten-year-old 
child, whose ability is 2c above the mean of unselected ten-year-olds in 
reading and 2c above the mean of unselected ten-year-olds in arith- 
metic reasoning, is not variable in these two subjects, in the statistical 
sense of ‘‘variability-from-a-central-tendency.”’ His reading age, 
however, is 15-6 and his arithmetic reasoning age is 13-2.'! This is a 
difference of more than two years in educational attainment. Does 
this mean that the child should be in one group in reading and in a 
different group in arithmetic? Does it mean that children of ten are 
really more variable in their potentialities for reading than in their 
potentialities for reasoning arithmetically? Or is the difference due 
to emphasis in instruction or school opportunity more favorable to 
development in reading than in arithmetic reasoning? Or is it due 
to some factor in the way in which the tests were constructed or 
standardized? 





1 Determined from means and sigmas for unselected ten-year-olds on Stanford 
achievement given on pp. 343 and 344 of Genetic Studies of Genius, Vol. I. 
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Possibly all three factors enter into the result. This paper is 
presented as evidence that one or both of the last two factors are 
effective. 

In an effort to express the differences in sigma-standing of an 
individual in terms of “‘teaching-distances”’ (if that term is intelligible 
as an expression for differences which will show the range of the 
teacher’s task) we studied the scores on the Stanford Achievement 
Test and on group intelligence tests of all ten-year-olds in a certain 
community. While they were unselected, for that community, they 
test somewhat above the norms for unselected ten-year-olds in general 
and probably reflect the somewhat superior social status of the com- 
munity. The community was one of about fourteen thousand popula- 
tion and the ten-year-olds comprised about eleven per cent of the 
whole school enrolment. They numbered one hundred forty-seven 
and were located in Grades II to VI. 

We studied their attainments in two ways. First, we made a 
distribution of the raw scores on each test! and obtained the means 
and sigmas in terms of score-points. We then found the scores corre- 
sponding to .5 sigma points from —2¢ to +2c¢, and read from the table 
of norms the age-norm for these scores. This gave us a distribution of 
the range of sigma-steps in age-terms as shown in Table I. It will be 
noted that while the various tests differ by not more than three months 
at Oo (or the mean), the greatest difference at — 2c is thirteen months 
and at +20, twenty-five months. That is, a ten-year-old child may 
reach the norm for 16-7 in paragraph reading and for 14~7 in spelling, 
and still not be ‘“‘variable”’ at all, whereas if his paragraph reading age 
is 12-0 and his arithmetic computation age 12-3, he varies by .5¢ in the 
two tests. 

This peculiar kind of discrepancy led us to test these results by 
starting out with a different method of distribution. In this second 
method, we converted raw scores into age-scores first, distributed the 
achievement ages, and then calculated the mean and sigma values of 
achievement ages. We thus obtained the values in Table II. This 
procedure widens the age-differences in means slightly but contracts 
the extreme points of —2c and +2c. The greatest difference at +2¢ 
in achievement age is seventeen months by this method, instead of 
twenty-five months. It still remains possible however to have a 





1 Except the intelligence test, where the units had to be in terms of mental 
age since different tests were used in the primary and intermediate grades. 
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higher achievement age in paragraph reading than in arithmetic 
computation and yet deviate more from the group average in arith- 
metic computation than in paragraph reading. The peculiar effect 
of this may be seen more clearly in the accompanying graph. 
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TABLE I.—AGE-EQUIVALENTS FOR SIGMA VALUES (SIGMA CALCULATED ON Basis 
or DISTRIBUTION OF Raw Scores) 








—2e¢ | —1.50| —1.00| —.5¢ M +.50 | +1.00 | +1.5¢ | +2.0¢ 
Total score......... 7-8 8-11 9-8 10-6 11-2 11-10 | 12-7 13-7 14-10 
Total reading....... 7-10 8-11 9-8 10-5 11-2 11-11 | 13-2 14-11 | 16-1 
Paragraph reading..| 7-11 8-10 9-7 10-4 11-1 12-0 13-8 15-8 16-7 
Arithmetic compu- 
tation............| 8&2 8-10 9-5 10-4 10-11 | 11-7 12-3 13-10 | 15-5 
Arithmetic reasoning}; 7-1 8-9 9-6 10-7 11-2 11-11 | 12-7 13-4 14-6 
eee 7-3 8-5 9-5 10-6 11-2 11-10 | 12-7 13-5 14-7 
































TABLE II.—AGE-EQUIVALENTS FOR SIGMA VALUES (SIGMA CALCULATED ON Basis 


OF DisTRIBUTION OF ACHIEVEMENT AGEs) 








—2¢ | —1.5¢| —1.00 | —.5¢ M +.5¢ | +1.00 | +1.5¢ | +2.0¢ 
Total score......... 8-1 8-10 9-7 10-4 11-2 11-11 | 12-8 13-5 14-3 
Total reading....... 7-7 8-7 9-6 10-5 11-4 12-4 13-2 14-2 15-1 
Paragraph reading..| 7-3 8-3 9-4 10-5 11-5 12-6 13-7 14-7 15-8 
Arithmetic compu- 

Se 7-4 8-2 9-1 10-0 10-10 | 11-9 12-8 13-6 14-5 
Arithmetic reasoning} 8-2 8-11 +9 10-6 11-4 12-1 12-11 | 13-8 14-6 
ee 7-7 8-5 9-3 10-2 11-1 11-11 | 12-10 | 13-8 14-7 
Mental age......... 8-0 8-10 9-8 10-6 11-4 12-2 13-0 13-10 | 14-8 
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Our original problem of converting differences in z-scores into 
terms of ‘‘teaching-distances” seems to be impossible of solution 
under these conditions. The results we find, however, seem to 
make the practice of comparing a pupil’s achievement ages in 


various subjects with each other or with mental age of extremely 
doubtful value. 
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KNOWLEDGE OF RESULTS AS AN INCENTIVE IN 
SCHOOL ROOM PRACTICE 


FRANCIS J. BROWN 


School of Education, New York University 


INTRODUCTION 


It is the purpose of the writer to briefly summarize and evaluate 
the literature in the field of knowledge of results as an incentive and to 
present certain data resulting from further experiment in this field.! 

A glance at the elaborate systems of incentives worked out in our 
public schools shows that the schools are awake to the need of incen- 
tives. In addition to the grading system, many schools have highly 
developed schemes of awarding stars and other honors. In some 
schools the system is planned in careful detail involving an investiga- 
tion committee of faculty and students, a careful survey of the pupils’ 
activities in intra and extra curricular work, and finally, an assembly 
of no small dramatic power at which time the honors are awarded. 
All believe that some type of incentive is of value, but little has been 
done to determine their relative value. 

Such an interest in incentives is the school’s response to the realiza- 
tion that there is a variable factor as yet unmeasured by the educa- 
tional test. Given children of comparatively the same native capacity, 
subjecting each child to the same general type of mental training under 
the same general conditions, why is not the product practically iden- 
tical? Yet measurements have shown that the dispersion of the 
distribution curve is greater after this uniform procedure than before. 
Thus we are led into an analysis of pupils’ attitudes in the attempt to 
determine at least one contributing cause of this variability of accom- 
plishment. Such an analysis of attitude can only be studied scien- 
tifically in the light of specific incentives under controlled conditions. 


| 


THE PROBLEM HISTORICALLY 


Compared with such fields as the learning process, the experimental 
work done in this field has been meager. Some experimental data 





1The experiment was conducted by one of the writer’s graduate students, 
Mrs. Lula B. Hathaway. 
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has been done, however, in the fields of general attitudes or will to 
learn and also with reference to the more specific incentives such as 
knowledge of results, encouragement, discouragement, rewards and 


punishments. 


One of the earliest experiments in the field of attitude was carried 
The learning of Telegraphic Language was 
the material used in the experiments and the general conclusion reached 
was that the daily practice of the language led to no great improve- 
ment unless intense effort was stimulated in some way. One conclu- 
sion seems to stand out more clearly from these facts than any other, 
namely, that in learning to interpret the telegraphic language it is 
intense effort that educates—throughout the whole length of the 


on by Bryan and Harter. 


curve.! 


Book arrived at a very similar conclusion as a result of the collection 
of data concerning the learning of typewriting. He showed indis- 
putably that intense effort efficiently applied to details was a necessary 
condition for the improvement of typewriting. New adaptations or 
forward steps in learning were only made when intense effort was 
rightly applied to the work.? 

Meumann reports an experiment carried on by Ebert and himself 
in the field of learning quotations from classical and scientific literature. 
Here it was shown that the rate of memory could improve very rapidly 
under daily practice with stimulation, whereas it improved very little 


by the practice involved in ordinary school routine. 


The subjects 


of this experiment ranged from seven to fifty-four years of age. The 
ability to improve increased from twenty per cent to forty per cent | 
under stimulation in the laboratory. Their conclusions are expressed 
as follows: ‘‘ We profit from continued practice only in proportion as_ 
we incite the will to progress and arouse an intention to improve.’’® 
Miss Mulhall, in studying recall and recognition, found that the 
determination to remember materially increased recall memory while 
it had but little effect on recognition memory. Here again is proof 





1 Bryan and Harter: ‘‘Studies in the Psychology and Physiology of the Tele- 
graphic Language.” 
2 Book, W.: ‘‘The Psychology of Skill.” Pp. 121, 130, 1365. 
? Meumann and Ebert: ‘“‘ Economie und Technique des Lernens Archives fiir die 
gesante Psychologie,” Vol. IV, pp. 357-359. Reported by Meumann: “ Psychology 


of Learning.” 


Pp. 1-232. 
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of a determination to recall, proving a valuable incentive in the difficult 
task of memory recall.! 

Turning from the field of more general attitude to the specific field 
of knowledge of results as an incentive we find that a great deal of 
the experimental work in incentives has centered around the problem 
of the knowledge of results. This incentive is much more easily 
controlled than any of the others and involves less intense emotional 
factors than such incentives as encouragement and discouragement. 

Arps and Wright in a series of ergographical studies demonstrated 
very clearly the incentive of a knowledge of the results. Wright’s 
result shows a knowledge of the score increased the subjects efficiency 
on the ergograph.’ 

Arps used stimulus and control series with and without knowledge 
of results. He summarizes his findings as follows: ‘‘The curves show a 
general agreement in that when the observer is aware of his progress 


_ he is strikingly more efficient than when he is ignorant of his progress.’’ 


Judd experimented on this incentive also. His first data concern 
the results of three subjects working with the Miiller-Lyer Illusion until 
it had been practically overcome by each. ‘Correction of habit 
formed without recognition shows greater fluctuations than that 
formed with recognition.’’* 

Judd, in some further experimental work on practice without a 
knowledge of results, found that there was little improvement. Judd 
concludes as follows: ‘‘The striking fact which appears in the results 
of these ten days is that the practice brings little, if any change. The 
first day and the last day differ from each other about the same as did 
the first and second. There is no motive for improvement.’’® 

In the field of reading, Stone used college students as subjects 
and measured the improvement of their reading ability. He worked 





1 Mulhall, E.: Experimental Studies in Recall and Recognition. American 
Journal of Psychology, Vol. X XVI. pp. 219-224. 

2 Wright: Some Effects of Incentives on Work and Fatigue. Psychological 
Review, Vol. XIII, pp. 23-24. 

’ Work with Knowledge of Results vs. Work without Knowledge of Results. 
Psychological Monographs, Vol. XXVIII, p. 3. 

4 Judd: Practice and Its Effects upon the Perception of the Illusion. Psycho- 
logical Review, Vol. IX, pp. 27-39. 

5 Judd: Practice without the Knowledge of Results. Psychological Review, Vol. 
VII, pp. 185-195. 
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with three groups of students, one of which knew individual and groups 
scores and had an analysis of test results, the second group knew indi- 
vidual and group scores but had no analysis, while a third group were 
given tests but no practice was required. Stone found that for the 
students to improve ‘‘they must have the motive of knowing their own 
results. Knowledge of score is a potent factor in further progress.’’! 

Kirby experimented in the field chosen by the present writer, 
measuring the results of a knowledge of score in arithmetic drills. 
His conclusion is as follows: ‘‘ Much of the gain must have been due to 
the children knowing their previous score.’’? 

By far, the most significant piece of work in this field of incentives 
was contributed by Book and Norvell in 1922.4 Here a series of 
experiments were conducted under controlled conditions with a 
knowledge of results as the only supposed variables. The subjects 
of the experiment were two groups of college upper classmen. Of 
the group tested, seventy-five were women and forty-eight were men. 
All were given a preliminary test before beginning a practice period 
extending over nine weeks. This period closed with a final test, 
followed by three weeks of further drill during which the experimental 
data was collected. The functions tested were four in number, making 
the letter ‘‘a’’; a crossout word test; an association test; a multiplica- 
tion of two place numbers by two place numbers. The stimulus 
group was in each case informed of their results while the control group 
was kept in ignorance of their scores. Control and stimulus groups 


TaBLeE I.—PERCENTAGE OF IMPROVEMENT OF STIMULUS GROUP ABOVE CONTROL 








GROUP 
Method of measurement Men Women 
be ye PE ee Te ee eee eee Ee 25.3 6.2 
ce db cke eee decnck 91.9 63.3 
a i a a Bi a ke Bad 2.4 32.2 
ic cininkh bas osc knes £00 ee nwnas 22.5 12.7 














1 Stone: Improving the Reading Ability of College Students. Journal of Edu- 
cational Mathematics, Vol. I1, pp. 8-23. 

2 «Practice in the Case of School Children,”’ Teachers College Contribution 58, 
p. 43. 

3 Book and Norwell: The Will to Learn. Pedagogical Seminar, Vol. X XIX, pp. 
308-309. 
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were reversed after two weeks. Thus each group had two weeks with 
knowledge of results and one week without, or two weeks without and 
one week with knowledge of results. 

The authors draw the following conclusions: “ All stimulus sections 
increased their output in each experiment more than did the control 
group.” 

Book and Norvell’s data show quite clearly that a stimulus group 
which has been making rapid continuous progress ceases to do so when 
the incentives are removed and the exact reverse is true of the control 
group because, given an incentive, improvement suddenly begins and 
becomes rapid and continuous. 

Ladd and Woodworth in summarizing some of the experimental 
material in this field state: ‘‘ Experimental conditions are stimulating 
largely because one has a measure of one’s success and progress— 
checking up one’s work can scarcely fail to prove of benefit whenever 
measures of success and failure are practicable.’’! 


A CRITICISM AND RESTATEMENT OF THE PROBLEM 


All the experimental work in the field of incentives is open to 
serious criticism. Book and Norvell’s experiment was by far the 
most significant and scientific piece of work in the field, though even 
this experiment can be questioned on many points. Some very 
startling conclusions have been drawn from exceedingly meager data. 
Experiments, where but three people are examined, can scarcely yield 
results applicable to the whole field of education. The following 
criticisms may be made of the experiments reviewed above. While 
perhaps none of the experiments are subject to all of the criticisms, all 
without exception can be criticized on one or more of the following 
counts. 

1. All the experiments with the possible exception of Kirby’s have 
been done with advanced students under unnatural conditions. 

2. For the most part, the subjects were fully aware of the experiment 
and its purpose. This involves a stimulus which is not merely a 
knowledge of results. In practically all cases, the unnatural conditions 


of an experiment with unusual tests made the subjects aware that an 
experiment was in progress. 





1 Ladd and Woodworth: ‘‘Elements of Physiological Psychology.” P. 571. 
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3. There was apparently no attempt to eliminate other incentives. 
This criticism follows quite naturally from the previous one. In no 
case is there any effort to limit the incentives to one, that one being the 
knowledge of results. For example, in none of the reports of experi- 
mental work is there any mention of any effort being made to overcome 
competition which would be an almost inevitable factor in any groups 
working together. 

4. There is a lack of uniformity in the procedure of the experiments. 
For example, Stone’s method was invalidated by his failure to keep the 
procedure of his groups uniform. The groups had different teachers 
and there was no record of the practice in reading through other 
subjects outside the class. 

5. There seems to have been no attempt to equate control and 
stimulus groups. It would seem incredible that two groups could be 
used as control and stimulus without some attempt at equation. Yet 
we find no mention of this in the literature. In Book and Norvell’s 
experiment we find these words, “‘ Because of the inequalities in the 
initial ability of the two groups to do the tasks that were to be learned 

.. ” $cientifically, it is impossible to compare the achieve- 
mont of two groups unless equated at the beginning of the 
experiment. 

6. Practice effects could scarcely be eliminated in such unusual 
tests. With the exception perhaps of Kirby’s experiment it would 
seem that such unusual tests must of necessity produce practice effects. 
In such a case as Kirby’s, however, there would seem to be no reason 
for expecting any measurable quantity of improvement in two weeks 
when the practice of arithmetic drill has been familiar throughout the 
school life of the group. 

7. A definite incentive was given to the control group. The ‘ 
control group, knowing the purpose of the experiment could not help 
being interested in their scores. It would be impossible for them 
deliberately to forget them. Almost unconsciously there must have 
been the effort to continue beating their own scores as they had been 
incited to do when acting as the stimulus group. 

8. Attempts to eliminate other incentives tended to increase rather 
than to decrease their effectiveness. It is an acknowledged fact that a 
statement, “forget your higher score and think nothing of improve- 
ment,”’ would tend to fixate the individual’s previous score, rather than 
to erase it from his mind. 
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It is in view of the foregoing criticism that the following experi- 
ment was conducted to test knowledge of results as an incentive 
under conditions against which these criticisms could not be 
leveled. 


METHOD 


Subjects—The subjects for the experimental work were one 
hundred thirty-eight children of a large public school system. In 
the first experiment the children (boys and girls) were in the 7A 
grade of one of the junior high schools. In the second experiment 
they were 5A grade children in an elementary school. Incidentally, 
both schools were in the foreign section of the city, and by far the 
larger part (in fact, 84.3 per cent) of the children were of foreign 
parentage. 

Types of Acquisition.—The type of knowledge studied for the test 
was a drill on arithmetic. In both cases great care was taken to 
see that the work had been learned in other grades. In the lower 
grades the papers were based entirely on the fundamentals while 
in the seventh grade the test papers included fractions and 
decimals. 

The Devising of Tests.—The tests given were devised by the mathe- 
matics teacher of the grade and carefully duplicated by the experi- 
menter through interchanging the figures in such a manner as to keep 
the problems of equal difficulty. Where a fundamental example 
involved a problem (e.g., a subtraction of figures from zeros such as in 
9000 — 4897) the same problem was reproduced in every test paper, 
changing the order of numbers in the subtrahend. By this method it 
was insured that no new work was given by reason of the unfamiliarity 
of the experimenter with the public school syllabus. In every case the 


TABLE II.—CoMPaARATIVE RaTINGs oF 7A GrRoUP 





| 





Average teacher 
Average Terman : 
Average age estimate on five 
1 score ; 
point scale 
7A-2 12 years 8.0 months 57.9 2.8 
7A-4 12 years 9.5 months 68.0 3.2 
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test paper was put into the hands of the mathematics teacher in time 
for her approval before giving it to the children. 

Selection of Groups.—In the first experiment the groups were 
selected on the basis of their combined test score and teacher rating. 
Table II shows the comparative ratings of the 7A experimental 
groups. 

In the case of the second experiment the 5A group children had not 
been tested and therefore there were no available data concerning their 
intelligence rating. The groups were equated mentally and by 
achievement, however, on the combined teachers’ estimates together 
with that of the principal. The difference between the groups in the 
first experiment and the artificial equating in the second experiment, in 
no way invalidated the result. As will follow later, there are two ways 
of comparing the groups. In the first method each grade as a stimulus 
group is compared with its own performance as a control group. Thus 
the difference in the two grades does not enter into the comparative 
results. The second method used in the first experiment compares the 
low group as stimulus against the high group as control making the 
result of the incentive more, rather than less apparent. 

Time of Experiment.—Each experiment extended over a period of 
twenty days. The time utilized was the ten minute drill period given 
at the beginning of the arithmetic lesson in accordance with the divided 
period regularly used in the school system in which the experiment was 
conducted. During the first ten-day period 7A-2 acted as the stimulus 
group with 7A-4 as the control group. ‘This first half of the experi- 
ment was conducted during December. After the Christmas vacation 
the experiment was resumed but 7A-4 now became the stimulus group 
while 7A-2 wasthe control. Inthesecond experiment a similar twenty- 
day period was given, 5A-1 being the stimulus group and 5A-2 the 











TABLE III 
| First half of Second half of 
Experiment Group experiment experiment 
(ten days) (ten days) 
I 7A-2 Stimulus Control 
7A-4 | Control Stimulus 
II 5A-1 Stimulus Control 
5A-2 Control Stimulus 
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control. A period between the two halves of the experiment consisted 
of two and one-half weeks including the Jewish Passover and Easter 
vacation. After vacation the experiment was continued with 5A-2 
asthe stimulus group and 5A-lasthecontrol. It is safe to assume that 
fifth and seventh grade children would not carry over an interest in a 
subject (in which there had been no attempt to interest them) after 
two weeks vacation. To their minds it was the normal procedure and 
there was no incentive to remember their scores. The arrangement is 
shown below in tabular form. 

Classroom Procedure.—The classroom procedure was made as 
normal as possible in order not to excite the children’s suspicions that 
any external influence was at work. In the stimulus groups the teacher 
announced that after the drill the children would represent their 
results graphically. This was done by means of a bar graph. There 
was no exhortation to work well but merely a statement of fact and an 
illustration of how to draw a bar graph. In the control groups the 
only remark made was that the children would pass their papers to 
the teacher’s desk instead of correcting them as they were in the 
habit of doing. In both of the experiments the mathematics teacher 
corrected the papers in order to eliminate any inaccuracy of scoring. 

Method of Scoring.—Each child was credited with the number of 
sums correct. In the first experiment the maximum was twelve and in 
the second experiment it was six. In the stimulus group, each child 
graphed the number correct for each day. 

Preparation for Experiment.—There was absolutely no preparation 
for the experiment on the part of the group involved. In all cases it 
was the normal experience of the group to have an eight or ten-minute 
drill period preceding the assignment. As the teacher set the drill 
paper in the first instance it was in accord with the usual work of the 
group. The experimenter herself was entirely unknown at the 
school, as the papers were sent to her when collected. In no way 
did the experiment vary from the classroom procedure with which 
the children had been familiar throughout the latter part of their 
school life. : 


SPECIAL ADVANTAGES OF THE METHOD USED 


I. The device of alternating the conditions in each experiment 
making a group the stimulus group for the first half of the experiment, 
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and the control group for the second half eliminated all group variables 
in the result. Thus the amount of influence exerted by the incentive of 
knowing the results could be measured by the rate and amount of 
improvement made by each group. 

II. The method used in the present experiment meets the criticism 
of the former experimental work presented above. 

a. As already indicated, the work was done with children in public 
schools under the normal conditions of the classroom. All incentives 
or inhibitions arising from strange, unnatural procedure, strange 
environment and strange experimenters were eliminated. 

b. The children had no knowledge of the experiment or its purpose. 
Thus they had no incentive along the line of “Trying to make the 
experiment work” or “Trying to prove the experiment would not 
work.’”’ The only factor that differed from their normal procedure was 
the graphing of their results by means of bar graphs. In the case of the 
7A the children were used to graphing their results in other subjects. 

c. The lack of the bizarre in the experiment eliminated all desire for 
the children to compare their results. For the case of the control 
experiment the papers were collected immediately without chance of 
remembering and comparing answers if there had been the desire to do 
so. In the case of the stimulus group each child was given his score 
and graphed it, handing the graph back again to the teacher. Again 
the point cannot be emphasized too strongly that the lack of any 
strange factors in the classroom procedure eliminated all desire to 
compare results. 

d. The control and stimulus groups in each experiment were given 
the same number of practices so that they could be equated. Each had 
a consecutive practice period of ten days. 

e. In the case of the first experiment the groups were equated by 
their Terman score and teacher’s estimate. In the second experiment, 
the groups were equated on the basis of school marks and teacher’s 
judgment. 

f. The gain in the second half of the experiment was not due to 
practice as the process of drill on these similar examples of the same 
type and difficulty had been continuous throughout the year. The 
irregularity of the graphs bear witness to the lack of practice 
effects. 

g. There was no danger of giving incentives to the control group 
as ignorance of the experiment eliminated all difficulty of urging the 
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control students to forget the results they had achieved as the stimulus 
group. Thus one of the weak points in the former experimental work 
was avoided. There had not been the added incentive of urging the 
children to beat their own scores so that this further eliminated any 
carry-over of interest into the control group. ‘The two vacation weeks 
fully and finally eradicated from the children’s minds any interest in 
their previous graphing. 


RESULTS 


The record made by each child in each practice in both experiments 
was determined by counting the actual number of problems correctly 
solved. The original tables of data, contained the name of each sub- 
ject and the score made in each ‘of the practices for each experiment. 
From this data results curves were constructed for each group and in 
some cases for the individuals in the group. Due to the fact that 
each group was used both as a control and as the stimulus group, these 
curves show graphically the effect of this incentive upon the rate and 
amount of gain made during both the control and experimental period. 
The absolute gain is shown by the direct increase in score. 

The following curves represent the two methods used of comparing 
the results. The first comparison shows 7A-2 and 5A-1 stimulus 
against 7A-4 and 5A-2 control groups. These groups, in each experi- 
ment, had their practices simultaneously and had the same test papers. 
The second method compares each grade’s performance as a stimulus 
group with its own performance as a control group. Thus in the first 
method of comparison the group is the only variable and in the second 
method the test papers are the variable factors. However, since 
both methods showed the same results, all possible variables were 
eliminated. 

The results of the experiment are conclusive. In every case but 
one the performance of the stimulus group is superior to that of the 
control group. This exception will be discussed in the next section. 

The results of the boys and girls were also kept separately and 
shown graphically in order to ascertain if any difference in the absolute 
gain of boys and girls could be detected. 

This first method used in comparing the results of the two groups 
has the advantage of eliminating the group variables as each group is 
compared with itself. The disadvantage of this method is the fact 





= 





Knowledge of Results as an Incentive 543 


that the test papers were a possible variable. In group 7A-2 there 
is a clear evidence of the influence that knowledge of results has on 
the work of the group. From a general average of 9.63 they dropped to 


I. RESULTS OF EXPERIMENT I 


TABLE IV.—AVERAGES ATTAINED IN EXPERIMENT I witH GRADE AS CONSTANT 








FAcToR 
Stimulus Control Gain as result 
of incentives 
7A-2 9.63 7.59 2.04 
7A-4 9.18 9.25 07 














7.59 when the incentive of knowledge of results was removed. (See 
Fig. 1.) 


TABLE V.—AVERAGES ATTAINED IN EXPERIMENT [I witH TEST PAPERS AS THE 
ConsTANT FAcTOR 








Stimulus Control Gain as result 
of incentive 
7A-2 9.63 9.25 38 
7A-4 9.18 7.59 1.59 














Grade 7A-4 presents somewhat contradictory evidence but through- 
out the experiment they were an unstable group. Their performance 


TABLE VI.—Sex DIFFERENCE IN ABSOLUTE GAIN THROUGH USE oF INCENTIVE 
IN EXPERIMENT I[ 














Boys Girls 

7A-2 Joh oe des aon 9.28 9.52 
7A-4 ga eed ook ara ea ah 8.95 9.21 
Te a aale cis .33 31 

7A-4 SE Pe 8.82 9.98 
7A-2 RS fe Eo ee ee 7.12 8.34 
ERR ARR ears 1.70 1.64 
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Fig. 1.—Results curve for 7A-2 (boys and girls) in Experiment I. 
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was very uneven, their results uncertain. The faculty of the school 
could throw no light on the problem. It remains however, to be 
acknowledged that some unknown element left that group unstable 
in their work. This can be seen readily by reference to the erratic 
nature of their graphs. (See Fig. 2.) 

Here the element of gain which accompanies the incentive is even 
more pronounced. The comparison of these two methods of tabulating 
results shows beyond doubt that knowledge of results causes increased 
scores. 

Sex differences were studied in this group. The graphs show in all , 
cases a greater absolute gain on the part of the boys than of the girls. 
(See Fig. 3.) 

The difference in gain by reason of the incentive on the part of 
the boys and girls can scarcely be judged as to its significance. The 
differences are too meager for the drawing of any sweeping con- 
clusions. It is interesting to notice that the boys consistently have 
a lower general average than the girls. It is possible that there is a 
slight indication that those of lower averages tend to be more influenced 
by incentives than those whose average is uniformly high. 

The value of the incentive is made even more striking by a com- 
parison of stimulus and control results from the scores of two indi- 
viduals selected at random. 

Two such scores as these show clearly the effect of an incentive 
upon the work. Here again we find the boy more influenced by the 
incentive of knowledge than the girl. (See Figs. 4 and 5.) 


II. RESULTS OF EXPERIMENT II 


TaBLE VII.—AVERAGES ATTAINED IN EXPERIMENT II witnH GRADE AS THE 
ConsTANT Factor 








Stimulus Control Gain as result 
of incentive 
5A-1 4.903 4.01 902 
5A-2 4.8 4.49 39 














As in the first experiment this method has the advantage of keeping 
the grade the constant factor. This second experiment shows a 
decided gain for the group with the incentive of knowing their results. 
Their average as a stimulus group dropped from 4.8 to 4.49 while 
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Fic. 3.—Results curves for 7A-4 stimulus and 7A-2 control group. (Boys and girls 
shown separately.) 
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Fic. 4.—Results curve of individual pupil, Milton A. showing the effect of the knowl- 
| edge of results on score. 
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their scores increased by .9 when the stimulus was given. (See Fig. 6.) 

These results again support the contention that knowledge of 
results act as an incentive producing higher scores. (See Fig. 7.) 

Sex difference here supports the evidence found in Experiment I. 
The boys show a greater absolute gain than the girls. It is significant 
that the two experiments carried on with totally different conditions, 
younger children, different school history and in a different locality 
should show the same tendency on the part of the boys to be more 
influenced by incentives. (See Fig. 8 and Table IX.) 


TaBLeE VIII.—AVERAGES ATTAINED IN EXPERIMENT II wiru TEsT PAPERS AS THE 
CONSTANT FACTOR 














Average number correct 
5A-1 ER era ye ee 4.903 
5A-2 te Cate ENS ee oe 4.01 
Gain as result of incentive...|; ..... .893 
5A-2 A TE ee ee 4.8 
5A-1 Gs ich Poe wie 5d oi hes 4.49 
Gain as result of incentive...|  ..... ol 














Comparison of value of incentives from scores of two individuals 
selected at random is shown in Figures 9 and 10. 


TaBLE [X.—Sex DIFFERENCE IN ABSOLUTE GAIN THROUGH USE OF AN 
INCENTIVE IN EXPERIMENT II 














Boys Girls 

5A-1 0 Fee errr 4.87 5.62 
5A-2 err | ae 4.18 4.22 
Sno ite ae) a .69 .40 

5 A-2 CS eat Rs hme 4.93 4.58 
5A-1 EERE? Henge emmy 4.08 3.98 
LORS argent: © reer .87 .60 














Here the effect of the incentive, while very apparent, is not so 
marked as in the case of Milton A. and Grace S. in Experiment I. It 
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Fie. 7.—Results curves for 5A-2. (Boys and girls in Experiment II.) 
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Fig. 8.—Results curves for 5A-2 stimulus and 5A-1 control. (Boys and girls shown 
separately.) 
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Fic. 9.—Results curves for individual pupil Rose 8. in second experiment. 
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may be that the younger children do not respond as readily to incen- 
tives but the data is too insufficient for such conclusions. 


INTERPRETATION AND DISCUSSION OF THE RESULTS 


In interpreting such data as are presented in this experiment it 
is necessary to be exceedingly cautious in the drawing of conclusions. 
It is the purpose of the writer therefore merely to present the facts as 
discovered from the experiment and suggest lines along which further 
work is needed. The following conclusions may be drawn from the 
data presented: 

I. Effect of Knowledge of Results upon Achievement. 

a. With test papers the constant factor, that is, in comparing 
stimulus and control groups during the same stages of the experiment, 
the stimulus sections made more continuous gains than the control 
sections. 

b. With the group the constant factor, that is, comparing the group 
as control with its own score when a stimulus group, each section when 
knowing its results made higher scores and a more consistent gain than 
when ignorant of results. 

c. Likewise, sections which have not been subjected to the incentive 
of the knowledge of results will raise their average on the application of 
the incentive. 

The data collected show that it is reasonable to expect some increase 
in score if the results of previous work is known. It is clear that the 
experiment offers no basis for any quantitative conclusion as to the 
influence of the incentive of knowledge of results. The results of 
the fifth and seventh grades might suggest that younger children are 
less susceptible to such incentives but the hypothesis would require 
proof. 

II. Sex Differences. 

Although the facts are clearly in favor of boys being the more 
susceptible to this incentive, it would be unsafe to assume dogmatically 
that boys are influenced more easily by incentives than girls, without 
further extensive experimentation. 

III. Suggestions for School Room Practice Based on Experimental 
Evidence. 

a. No tests should be given unless children know the results of 
previous tests. All too frequently the teacher fails to correct each 
test completed by the child before another is given. 
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b. Mere knowledge of results, while it tends to increase the score, is 
not as significant an incentive as when further emphasized by the 
teacher through some such scheme as graphing the results. 

c. Close attention should be paid to the relative performances of 
boys and girls. If class room data justify the conclusion that boys 
respond more easily to incentives than girls this should be significant 
in class room teaching. 











A STUDY OF THE RELATIONSHIP BETWEEN SOME 
FACTORS WHICH AFFECT SCHOOL WORK 


G. W. McMURTREY 
Aberdeen, 8S. D. 


This study deals with the relationship between the intelligence 
of college students and seven other factors: (1) Their marks; (2) teach- 
ers’ judgment of their intelligence; (3) their judgment of one anothers’ 
intelligence; (4) their attendance at ‘“‘movies’’; (5) their attendance at 
dances; (6) their having ‘‘dates’’; and (7) their earning of their own 
expenses while in college. 

A great many studies have been made on the relationship between 
intelligence and (1) and (2), but few serious studies have been made on 
the relationship between intelligence and any one of the five other 
factors. It is sufficient to state here that the coefficients of cor- 
relation between intelligence and school marks have been found to 
range between .30 and .60, the average being below .50.! 


SECURING THE DATA 


During the winter quarter of 1928-1929 at Northern State Teachers 
College the writer asked fifteen of his colleagues who were Heads of 
Departments or Professors to write down on prepared blanks the names 
of three students of junior college rank and three of senior college 
rank whom each of the Professors believed to be ‘‘among the highest in 
intelligence in their respective colleges.’”’ To the list secured were 
added by the writer the names of twelve other students, mostly of 
junior college rank (for there was a dearth of junior college students 
in the lists submitted) and who were known by the writer to be rela- 
tively low in intelligence. These names were arranged with the others 
in alphabetical order, making a list of sixty-five names. This list 
and the following request were sent to each of the forty-eight members 


of the faculty of the college and to each of the sixty-five students 
whose names comprised the list: 


. . . Below you will find a list of names of students. Will you... put down 
opposite the name of each student your candid estimate of his IQ? Try each one 
whom you know. If you do not know a student at all, skip him; but do not skip 
any on whom you can possibly pass judgment. 





1 Pintner: “Intelligence Testing.”’ P. 267. 
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For the benefit of students particularly it may be said that about one per cent 
of the total population test below 70; five per cent 70-79; fourteen per cent 80-89; 
sixty per cent 90-109; fourteen per cent 110-119; five per cent 120-129; and one 
per cent 130 or more. The average of college students in twenty-one colleges and 
universities was found to be 111. The average of the Freshman Class in Northern 
State Teachers College in 1927-1928 was between 108 and 109. 

Your ratings will be regarded as confidential. 


After the foregoing data had been collected a list of questions 
was sent to each of the sixty-five students being studied. This 
questionnaire called for statements as to (1) part of expenses being 
earned by the student; (2) whether the student had made an athletic 
team in college; (3) frequency of attendance at public or school dances; 
(4) frequency of having “dates’’ while attending college; and (5) 
frequency of attendance at “movies.” Each student was given an 
intelligence test, the Otis SA. The college record of each of the 
sixty-five students for the full time the student had been in college was 
examined and the literal expressions of their grades were translated into 
numerical terms. This was done by arbitrarily assigning numerical 
values to the letters in the grading system as follows: A, 100; B, 90; 
C, 80; D, 70; F, 60. The number of hours of each grade was multi- 
plied by the numerical substitute and the sum of these products was 
divided by the total number of hours of credit the student had on 
record. It is true that the values assigned are not those ordinarily 
assigned; but any numerical values whatever would be arbitrary and 
these are probably quite sufficient for the purpose intended, namely, 
the finding of relationships. 


FINDINGS AND CONCLUSIONS 


TaBLE I.—INTELLIGENCE AND COLLEGE MARKS 














Data compared Correlation Prenat 
| error 
CD Re SD CRUD, 4 nccidcdstecscevetesesen 51 + .076 
(2) 1Q and faculty estimate of IQ................. . 58 + .068 
(3) IQ and student estimate of IQ................. .59 + .068 
(4) Faculty estimate IQ, marks................... 71 + .074 





From the foregoing table it may be observed that the combined 
judgment of college Professors as to the intelligence of their students 
is slightly inferior to the combined judgment of the students as to 
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their own intelligence, when the intelligence test score is used as the 
criterion. 

The correlation between faculty estimate of students’ intelligence 
and the grades made by those students is much higher than the correla- 
tion between intelligence as indicated by psychological tests and col- 
lege grades. But the judgment of the Professors was made after the 
class work of the students had been done and was probably made 
largely on the basis of that work. Hence, such judgment should be 
expected to correlate to a rather high degree with quality of work. 
The correlation between IQ and college marks is no higher than itis 
because (1) the quality of work done by a student depends on several 
factors in addition to 1Q, namely, interest, health, attendance, time to 
devote to study, background of experience, will to work, etc. and (2) 
teachers’ marks are by no means infallible evidence of exact quality of 
work done. The data indicate, however, that the IQ as determined 
by the test used is a valuable indication of what the student will do in 
his classes. It is, however, a much more reliable indication of what the 
student is capable of doing than of what he will actually do under 
certain college conditions. 


TABLE I].—EARNERS AND NON-EARNERS 





Part of expenses earned “— | pron 
eS kk cca ed keds aenenne’ 115.875 | 91.1 
(2) One-half............... Leeeeeee{ 115.90 | 86.8 
(3) Three-fourths ............. ....+| 120.43 | 86.5 
oe cn ecu es kckede kia ee | 88.4 


! 





The data in Table II indicate that those who earn their expenses, 
wholly or in part, are brighter than those who earn none of their 
expenses, and those who earn most are brighter than those who earn 
least. The probable explanation appears to be that only the brighter 


TaBLE IIJ.—ATHLETES AND NON-ATHLETES 











: 
Participation IQ | pry 
| 
(1) Those who made an athletic team...... 112.78 | 86.1 
(2) Those who made no athletic team...... 116.45 | 89.2 





~~ 
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students have the ability to carry successfully their school work and do 
work on the outside. The duller ones find no time for earning. 

The IQ of the non-athletes is considerably higher than that of 
the more athletically inclined students. Likewise, the marks of the 
former are proportionately higher than those of the latter. It appears 
from these data that intelligence is the main factor in determining 
the higher marks on the part of the non-athletes. But with the greater 
amount of time to devote to study available to the non-athletes, 
the difference in marks should apparently be greater than it is. The 
fact that the difference in marks is not greater is probably due to the 
fact that students on athletic teams are frequently more leniently 
dealt with in assigning marks than are others. 

But how may one account for the fact that the non-athletes rank 
nearly four points higher in intelligence than do the members of athletic 
teams? It is obvious that athletics requires a different kind of ability 
from that required to do mathematics, science, etc. In fact, it is 
obvious that more intelligence, a higher type of intelligence, is required 
to succeed in academic courses than in athletics. Nothing satisfies 
like success. Those who cannot succeed, equal, excel in other subjects 
take to athletics. On the other hand, the brighter students derive 
more satisfaction from participating in activities involved in academic 
courses than they derive from the more exclusively motor processes 
involved in athletics. To them chemistry is more fascinating than 
clogging; higher mathematics is more practical than high jumping; 
and Tennyson is more satisfying than tossing a pigskin. 


TaBLE IV.—DANCERS AND NON-DANCERS! 








| . 

_— ' Scholastic 
Participation IQ | standing 

(i 115.4 89.00 





(2) Non-dancers........ | 113.8 | 88.3 





1It is highly probable that several of the students made very conservative 
statements concerning the number of dances they had attended. 


The students who dance are shown by Table IV to be slightly 
brighter than those who dance not at all, yet the difference in IQ of 
only 1.6 points is slight. The non-dancers lack only .7 of a point in 
being equal to the dancersinscholarship. The difference in IQ between 
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the two groups should, all other things being equal, make a larger 
difference in scholarship than that found. Furthermore, the non- 
dancers do slightly more work toward earning their expenses than do 
the dancers. Again, it may reasonably be assumed that those who 
dance possess a higher degree of social graces generally than do the 
total abstainers and that, being more efficient socially, secure more 
equitable, or at least more satisfactory, marks. From these facts 
it seems reasonably clear that dancing is slightly conducive to a grade 
of scholarship below the possibilities of the student. 











TABLE V.—‘‘ Movie” Fans anp NON-FANS 
aan Scholastic | Expenses 
1 
Fectnpetie IQ standing _—_— earned 
(1) One or more a week.................... 114.5 90.9 | .27 
(2) Less than one a week.................. 115.3 87.4 .42 





1 Only one student attended the picture show not at all. 


It will be observed that the more enthusiastic ‘“‘movie” fans have 
slightly lower IQ than the less enthusiastic ones, yet their scholarship 
is considerably higher. The explanation appears to lie, partly at 
least, in the fact that those who attend one or more ‘‘movies”’ a week 
earn only .27 of their expenses while those of the other group earn .42 
of their expenses. Hence, the conclusion that attendance at “‘movies”’ 
enhances scholarship can hardly be drawn from the data. 


TaBLE VI.—‘‘ Date” Fans anp NON-FANS 





| Scholastic | Expenses 


Participation IQ | standing | earned 





(1) “Dates” while in college............... — 111.3 90.5 | .48 
(2) No “‘dates” while in college.............| 115.3 | 87.4 | . 26 





Strange to say, the ‘‘date”’ addicts rank four points lower in intelli- 
gence and three points higher in scholarship than those who had no 
“dates”; and those who had “dates’’ earned nearly twice as much 
toward paying their expenses as did the other group. From the 
data it appears that having “dates is decidedly conducive to scholar- 
ship.” It may be possible that, under the conditions that exist at 
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Northern State Teachers College, having ‘‘dates” has a tonic effect 
and results in greater endeavor and keener concentration. Again, as 
may be the case with dancers, those who ‘‘date’’ may have superior 
social traits generally and thereby secure more satisfactory considera- 
tion when grades are being distributed. 


TaBLE VII.—MIscELLANEOUS 


(1) Average IQ sixty-five students...................... 114.23 
(2) Highest IQ sixty-five students...................... 132 
(3) No. members faculty who ‘‘spotted”’ brightest student 1 
(4) IQ of student rated highest by faculty............... 123 
(5) IQ of student rated highest by students.............. 122 
(6) Rating by faculty of student who tested highest... ... 118.5 
(7) Rating by students of student who tested highest..... 121.4 
(8) Highest ‘‘movie” attendance a term................ 30 
(9) Largest number of dances a term.................... 12 

(10) Largest number of ‘‘dates” a term.................. 60 

SUMMARY 


1. The judgment of brighter college students as to their intelligence 
is slightly superior to the judgment of their teachers concerning the | 
intelligence of the students. 

2. Students who earn all or part of their expenses while attending 
college are brighter than those who earn none of their expenses and 
brightness increases with increase in percentage earned. 

3. Members of college athletic teams are decidedly inferior intellec- 
ually and scholastically to non-members. 
4. College students who dance are slightly brighter than those who 


wy do not dance and their scholarship is slightly superior. 
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5. Students who attend “movies” most frequently are lower in 
x. ~ intelligence yet higher in scholarship than those who attend less fre- 
_. quently. Those who attend less frequently earn about twice as much 
~~ tow, 
/6. Those who have no “dates” while in college are decidedly 
brighter than those who have “dates” yet the scholarship of the former 
\§significantly lower than that of the latter. 
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DISCUSSIONS 


NOTE ON RELIABILITY COEFFICIENTS 


A recent article in this journal entitled ‘‘The Unreliability of 
Reliability Coefficients’! promptsthis comment. In spite of the many 
discussions of reliability and its relation to units of measurement, 
there is still confusion relative to the proper statistical units to use 
in judging the reliability of a given measure on a group of individuals. 
Frequently, the reliability is stated as the average error obtained when 
two independent applications of the measure are made on the same 
individuals. If the average error is small in terms of the test units, the 
measure is considered reliable, but if the average error is large the test 
is considered unreliable. 

Such a measure of reliability, stated in terms of the test unit, 
is totally misleading for it fails to take into consideration the relative 
size of the individual distinctions which the test purports to make. 
When the difference between the individuals which are being distin- 
guished by the test is small in terms of the test units then small errors 
of measurement are significant but when the individual differences 
in the trait are large, small errors of measurement are of no conse- 
quence. For example, an average error of two inches would result 
in serious errors of discrimination between individuals in their ability 
to high jump, whereas an average error of two inches in measuring 
individuals ability to broad jump would make practically no error in 
the distinctions between the individuals being measured. 

Any evaluation of the reliability of a measurement must state the 
errors made by the test in relation to the size of the individual differ- 
ences that are to be distinguished. The reliability coefficient which 
is a function of the ratio of the differences between two independent 
measures of a trait to the variability of the trait in question is, there- 
fore, the only proper evaluation of the degree to which the test makes 
bonafide distinctions between individuals. 

In the article referred to, the author reports the correlations, 
between two independent measurements of several anthropometric 
traits. In addition, he reports the average deviation between the 
two independent measures of each of nine traits. He points out that 
the median error made in measuring the sternal height was 6.2 mm. 
and the reliability was .944. On the other hand, head width measures 


1Lincoln, E. A.: The Unreliability of Reliability Coefficients. Journal of 
Educational Psychology, Vol. XXIII, January, 1932. 
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had a median error of only 1.7 mm. but a reliability coefficient as low 
as .591. Instead of concluding that distinctions made in the deter- 
mination of sternal height were more dependable than the distinctions 
made by the head measures, he concludes that the reliability coeffi- 
cients are not trustworthy measures of the accuracy of the tests. 
But this conclusion ignores the fact that differences between the head 
width of individuals are not nearly as large as differences in sternal 
height. 

Our purpose in bringing this issue to print is to establish a basic 
principle of biometric method. Our zeros are seldom true zeros and 
the units of measurement of the scales used are never comparable. 
Consequently, comparisons must be made after deviations in each 


variable have been translated into multiples of some standard measure 
of variability. 


RAYMOND FRANZEN 


MaAHEW DERRYBERRY. 
American Child Health Association, New York City. 





